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Nonlinear  Methods  of  Approximation1 


V.N.Temlyakov 

University  of  South  Carolina,  Columbia,  SC  29208,  USA 

Abstract.  Our  main  interest  in  this  paper  is  nonlinear  approximation.  The  basic 
idea  behind  nonlinear  approximation  is  that  the  elements  used  in  the  approximation 
do  not  come  from  a  fixed  linear  space  but  are  allowed  to  depend  on  the  function 
being  approximated.  While  the  scope  of  this  paper  is  mostly  theoretical,  we  should 
note  that  this  form  of  approximation  appears  in  many  numerical  applications  such 
as  adaptive  PDE  solvers,  compression  of  images  and  signals,  statistical  classification, 
and  so  on.  The  standard  problem  in  this  regard  is  the  problem  of  m- term  approx¬ 
imation  where  one  fixes  a  basis  and  looks  to  approximate  a  target  function  by  a 
linear  combination  of  m  terms  of  the  basis.  When  the  basis  is  a  wavelet  basis  or  a 
basis  of  other  waveforms,  then  this  type  of  approximation  is  the  starting  point  for 
compression  algorithms.  We  are  interested  in  the  quantitative  aspects  of  this  type 
of  approximation.  Namely,  we  want  to  understand  the  properties  (usually  smooth¬ 
ness)  of  the  function  which  govern  its  rate  of  approximation  in  some  given  norm  (or 
metric) .  We  are  also  interested  in  stable  algorithms  for  finding  good  or  near  best  ap¬ 
proximations  using  m  terms.  Some  of  our  earlier  work  has  introduced  and  analyzed 
such  algorithms.  More  recently,  there  has  emerged  another  more  complicated  form 
of  nonlinear  approximation  which  we  call  highly  nonlinear  approximation.  It  takes 
many  forms  but  has  the  basic  ingredient  that  a  basis  is  replaced  by  a  larger  system  of 
functions  that  is  usually  redundant.  Some  types  of  approximation  that  fall  into  this 
general  category  are  mathematical  frames,  adaptive  pursuit  (or  greedy  algorithms) 
and  adaptive  basis  selection.  Redundancy  on  the  one  hand  offers  much  promise  for 
greater  efficiency  in  terms  of  approximation  rate,  but  on  the  other  hand  gives  rise 
to  highly  nontrivial  theoretical  and  practical  problems.  With  this  motivation,  our 
recent  work  and  the  current  activity  focuses  on  nonlinear  approximation  both  in  the 
classical  form  of  m-term  approximation  (where  several  important  problems  remain 
unsolved)  and  in  the  form  of  highly  nonlinear  approximation  where  a  theory  is  only 
now  emerging. 
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1.  Introduction 

We  introduce  some  notations  and  orient  the  reader  on  the  topics  that  we  will  be 
discussing  in  this  paper.  We  begin  our  discussion  in  this  section  by  the  case  where 
approximation  takes  place  in  a  Banach  space  X  equipped  with  a  norm  ||  •  ||  :=  ||  •  ||x- 
We  formulate  our  approximation  problem  in  the  following  general  way.  We  say  a 
set  of  functions  V  from  X  is  a  dictionary  if  each  g  G  X  has  norm  one  (||g||x  =  1) 
and  the  closure  of  Span  V  coincides  with  the  whole  X.  We  let  Em(2?)  denote 
the  collection  of  all  functions  (elements)  in  X  which  can  be  expressed  as  a  linear 
combination  of  at  most  to  elements  of  V.  Thus  each  function  s  G  T,m(V)  can  be 
written  in  the  form 

s  =  ^2  ca9i  A  C  V,  #A  <  to, 

g€  A 

with  the  cg  are  real  or  complex  numbers.  In  some  cases,  it  may  be  possible  to  write 
an  element  from  Em(X>)  in  this  form  in  more  than  one  way.  The  space  Em(£>)  is 
not  linear:  the  sum  of  two  functions  from  Em(X>)  is  generally  not  in 
For  a  function  /  G  X  we  define  its  approximation  error 

Om(f,V)X-=  inf  1 1  /  —  s  1 1  x  i 

s£Sm  (V) 


and  for  a  function  class  F 


am{F,V)x  :=  sup  am(f,V)x- 

fSF 

The  classical  example  of  this  type  of  approximation  is  the  case  X  =  Lp  ( [0 ,  27t]  )  and 
V  =  B  is  an  orthogonal  basis  for  X.  In  particular,  B  can  be  taken  as  the  trigono¬ 
metric  system  T  :=  {elkx,  G  Z}  or  the  Haar  system  properly  normalized.  The 
first  results  on  error  estimates  in  to- term  approximation  showed  an  advantage  of 
to- term  approximation  over  approximation  by  polynomials  of  order  to.  R.S.  Ismag- 
ilov  [I]  (1974)  studied  to- term  trigonometric  approximation  of  individual  functions, 
namely,  the  Bernoulli  kernels 


Fr(x)  =  2^^k  r  cos(kx  —  rir/2). 

k= l 

He  proved  that 

am(F2,T)Lao  <Cem-6/5+e 

with  arbitrary  e  >  0.  It  is  known  that  the  best  approximation  Em(-)Lx  by 
trigonometric  polynomials  of  order  to  in  the  Loo-norm  has  the  asymptotic  order 
Em  {F'2 ) Laa  x  1/to.  Further  results  in  to- term  trigonometric  approximation  proved 
advantage  of  this  type  of  nonlinear  approximation  over  linear  approximation.  For 
many  traditional  pairs  of  function  class  F  and  orthogonal  system  B  the  orders  of 
<jm(F,B)x  are  known  now.  Investigation  of  the  case  F  =  Bg(Lq)  (standard  Besov 
class),  B  =  T  and  X  =  Lp  was  completed  in  [DTI],  This  investigation  required  new 
technique  (see  [DTI]  and  [KT1])  which  uses  deep  results  from  finite  dimensional 
geometry.  Thus  it  is  an  example  of  interaction  between  theory  of  nonlinear  to- term 
approximation  and  contemporary  functional  analysis.  We  discuss  these  results  in 
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detail  in  Section  6.  In  Section  6  we  also  consider  a  general  optimization  problem  in 
a  spirit  of  Kolmogorov’s  widths.  For  the  reader’s  convenience  and  for  motivation  of 
nonlinear  methods  we  give  in  Section  5  a  brief  discussion  of  optimization  settings 
in  the  Linear  Approximation.  Let  D  be  a  collection  of  dictionaries.  The  classical 
example  of  D  is  O  =  {orthonormal  bases  on  a  given  domain).  The  optimization 
problem  asks  to  find  (if  possible)  for  a  given  pair  of  collection  of  dictionaries  D  and 
function  class  F  a  dictionary  DeD  such  that 

crrn{F,'D)x  x  am(F,JH>)x  :=  _mf  am(F,  V)X- 

This  problem  is  interesting  and  important  for  theoretical  investigation  and  also  for 
practical  applications  where  we  often  want  to  have  a  dictionary  V  with  certain 
structure  (from  a  collection  D)  and  do  not  want  to  stick  to  a  particular  one.  In 
Section  6  we  discuss  only  theoretical  aspect  of  this  problem  for  the  classical  example 
of  D  =  O. 

The  next  problem  that  we  propose  to  investigate  is  to  find  a  universal  dictionary 
DgD,  i.e.  the  one  which  is  optimal  for  all  F  from  a  given  collection  F  of  function 
classes. 

Definition  1.1.  Let  two  collections  F  of  function  classes  and  D  of  dictionaries  be 
given.  We  say  that  G  O  is  universal  for  the  pair  (F.  D)  if  there  exists  a  constant 
C  which  may  depend  on  F ,  D  and  X  such  that  for  any  F  G  F  we  have 

(1.1)  cym{F,  V)x  <  Cam(F,  D)x- 

It  may  happen  that  for  a  given  pair  (F,  O)  there  is  no  universal  dictionaries.  In 
this  case  we  define  the  index  of  universality  and  look  for  a  dictionary  which  realizes 
(in  the  sense  of  order)  this  index.  Let  m  be  fixed.  Take  a  dictionary  DgD  and  for 
a  fixed  F  G  F  find  the  minimal  N(m,  V ,  F)  such  that 

&N(m,V,F){F,'D)x  <  Cr.m{F,B)X- 
We  define  index  of  universality  by 


iu(F ,  D,  to) 


inf  sup 
£><eb  fgf 


N(m,  V ,  F) 

TO 


This  is  a  new  concept  in  nonlinear  approximation.  The  following  observation  mo¬ 
tivates  our  interest  in  this  setting.  In  practice  we  often  do  not  know  the  exact 
smoothness  class  F  where  our  input  function  (signal,  image)  comes  from.  Instead, 
we  often  know  that  our  function  comes  from  a  class  of  certain  structure,  for  in¬ 
stance,  anisotropic  Sobolev  class.  This  is  exactly  the  situation  we  are  dealing  with 
in  the  universal  dictionary  setting.  So,  if  for  a  collection  F  there  exists  a  universal 
dictionary  D  G  D  it  is  an  ideal  situation.  We  can  use  this  universal  dictionary 
V  in  all  cases  and  we  know  that  it  ajusts  automatically  to  the  best  smoothness 
class  F  G  F  which  contains  a  function  under  approximation.  Next,  if  a  pair  (F,  D) 
does  not  allow  a  universal  dictionary  we  have  a  trade  off  between  universality  and 
accuracy  provided  by  the  index  of  universality.  We  discuss  the  universality  results 
in  Section  7. 
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We  discussed  above  best  to- term  approximation  with  regard  to  a  dictionary  V  in 
a  Banach  space  X.  The  sequence  {crm(f,  F>)x}  gives  the  lower  estimates  of  accuracy 
for  any  sequence  of  algorithms  Am  that  map  X  into  £m(2?),  where,  as  above  S m(T>) 
is  the  set  of  all  functions  in  X  which  can  be  expressed  as  a  linear  combination  of 
at  most  to  elements  from  V.  Thus,  the  sequences  {crm(/,  V)x}  and  {crm(F,T>)x} 
may  serve  as  the  target  accuracies  in  constructing  approximating  algorithms  Am. 
It  is  clear  that  the  best  algorithm  (if  exists)  gives  the  error 

(1-2)  \\f-Arn(f,V)\\x=arn(f,V)x. 

We  call  an  algorithm  Am  near  best  or  near  best  for  individual  functions  if 

(1-3)  11/  —  Am(f ,V)\\X  <  C(V,X)am{f,V)x 

for  all  f  E  X.  Similarly,  we  say  that  Am  is  near  best  for  a  function  class  F  if  we 
have  for  any  f  E  F 

(1-4)  \\f  -  Am(f,V)\\x  <C(F,V,X)am(F,V)x. 

It  is  clear  that  an  algorithm  Am  satisfying  (1.3)  is  excellent  from  the  point  of  veiw 
of  accuracy:  it  provides  near  best  approximation  for  every  individual  function  and, 
therefore,  for  any  function  class.  The  property  (1.4)  is  weaker  than  (1.3)  but  still 
is  very  good.  We  present  in  Section  2  some  results  on  linear  approximation  of 
individual  functions  and  function  classes.  The  corresponding  results  for  nonlinear 
approximation  with  regard  to  a  basis  are  discussed  in  Section  3.  Let  a  Banach  space 
X  with  a  basis  T  =  {i/Jk}kLi,  HV’fcll  =  1,  A:  =  1,2,...,  be  given.  We  consider  the 
following  theoretical  greedy  algorithm  that  we  call  Thresholding  Greedy  Algorithm 
(TGA).  For  a  given  element  f  E  X  we  consider  the  expansion 

OO 

/  =  ^Cfc(/)V>fc. 
k= l 

Let  an  element  f  E  X  be  given.  We  call  a  permutation  p,  p(j )  =  kj,  j  =  1,2,..., 
of  the  positive  integers  decreasing  and  write  p  E  D(f)  if 

|cfcl(/)|  >  |cfc2(/)|  >  ...  . 

In  the  case  of  strict  inequalities  here  D(f)  consists  of  only  one  permutation.  We 
define  the  m-th  greedy  approximant  of  /  with  regard  to  the  basis  T  corresponding 
to  a  permutation  p  E  D(f)  by  formula 

m 

3  = 1 

It  is  a  simple  algorithm  which  describes  theoretical  scheme  (it  is  not  computation¬ 
ally  ready)  for  to- term  approximation  of  an  element  /. 

We  have  discussed  above  the  general  optimization  setting  to  find  a  good  basis 
for  nonlinear  approximation.  On  the  base  of  this  discussion  we  suggest  a  three 
step  strategy  to  find  a  good  basis  (dictionary)  for  nonlinear  to- term  approximation. 
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The  first  step  consists  of  solving  an  optimization  problem  for  a  given  function 
class  F,  when  we  optimize  over  a  collection  D  of  bases  (dictionaries).  The  second 
step  is  devoted  to  finding  a  universal  basis  (dictionary)  Vu  G  D  for  a  given  pair 
(F,  D)  of  collections:  F  of  function  classes  and  D  of  bases  (dictionaries).  The  third 
step  deals  with  constructing  a  theoretical  algorithm  that  realizes  near  best  m- 
term  approximation  with  regard  to  Vu  for  function  classes  from  F .  We  worked 
this  strategy  out  in  the  model  case  of  anisotropic  function  classes  and  the  set 
of  orthogonal  bases.  The  results  are  positive.  We  constructed  a  natural  tensor- 
product-wavelet  type  basis  and  proved  that  it  is  universal.  Moreover,  we  proved 
that  Thresholding  Greedy  Algorithm  realizes  near  best  to- term  approximation  with 
regard  to  this  basis  for  all  anisotropic  function  classes.  We  discuss  these  results  in 
Section  7. 

It  is  also  very  important  to  find  analogs  of  G^(-,  T)  in  the  case  of  general  dic¬ 
tionary  V  and  to  study  their  efficiency.  We  start  this  discussion  with  confining 
ourselves  to  Hilbert  spaces.  We  define  first  the  Pure  Greedy  Algorithm  (PGA)  in 
Hilbert  space  H.  We  describe  this  algorithm  for  a  general  dictionary  V.  If  /  G  H , 
we  let  g(f )  G  V  be  an  element  from  V  which  maximizes  |(/,  g)\.  We  shall  assume  for 
simplicity  that  such  a  maximizer  exists;  if  not  suitable  modifications  are  necessary 
(see  Weak  Greedy  Algorithm  below)  in  the  algorithm  that  follows.  We  define 

G(f,V)  :=  (f,g(f))g(f) 

and 

R(f,V)  :=f-G(f,V). 

Pure  Greedy  Algorithm  (PGA).  We  define  i?.0 (/,£>)  :=  /  and  Go(f,V)  :=  0. 
Then,  for  each  m  >  1,  we  inductively  define 

Gm(f,V )  :=  ,V)  +  G(Rm-i{f,T>),V) 

Rm(f,V )  :=f-Gm(f,V)=R(Rm_1(f,V),V). 

In  Section  8  we  consider  the  problem  of  efficiency  of  Pure  Greedy  Algorithms 
with  regard  to  general  dictionaries  in  Hilbert  space. 

Remark  1.1.  In  this  paper,  we  study  only  theoretical  aspects  of  the  efficiency  of  m- 
term  approximation  and  possible  ways  to  realize  this  efficiency.  The  above  defined 
“greedy  algorithm”  gives  a  procedure  to  construct  an  approximant  which  turns  out 
to  be  a  good  approximant.  The  procedure  of  constructing  a  greedy  approximant 
is  not  a  numerical  algorithm  ready  for  computational  implementation.  Therefore 
it  would  be  more  precise  to  call  this  procedure  a  “theoretical  greedy  algorithm”  or 
“stepwise  optimizing  process”.  Keeping  this  remark  in  mind  we,  however,  use  term 
“greedy  algorithm”  in  this  paper  because  it  has  been  used  in  previous  papers  and 
has  become  a  standard  name  for  procedures  like  the  above  and  for  more  general 
procedures  of  this  type  (see  for  instance  [D],  [DT2]).  Following  [DDGS]  we  call  an 
algorithm  “incremental”  if  at  step  to  we  add  at  most  one  more  element  ipm  G  V  and 
approximate  by  linear  combination  c\ip\  +  ■  ■  ■  +  cmipm.  We  use  the  term  “greedy 
type”  for  an  incremental  algorithm  with  tpm  chosen  to  maximize  a  given  functional 
F(fm~i,g)  over  g  G  V  with  /m_i  is  a  residual  after  the  (to  —  l)th  step  of  the 
algorithm.  The  form  of  F(-,-)  determines  the  kind  of  greedy  algorithm.  We  use 
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the  term  “weak  greedy”  for  an  incremental  algorithm  with  <prn  satisfying  a  weaker 
condition  than  maximizing  the  given  functional.  For  instance, 

F(fm- 1,  <Pm)  >  tm  SUp  0  <  tm  <  1. 

g£V 

The  sequence  r  :=  {tk}kLi  is  called  the  “weakness”  sequence. 

We  begin  with  two  definitions  in  a  spirit  of  inequalities  (1.3)  and  (1.4). 

Definition  1.2.  We  call  a  dictionary  V  greedy  dictionary  for  a  Hilbert  space  H  if 
for  any  f  E  H  and  any  realization  of  Pure  Greedy  Algorithm  we  have 

\\f-Gm(f,V)\\  <C(V,H)am(f,V). 

Definition  1.3.  Let  r  >  0  be  given.  We  call  a  dictionary  V  r-greedy  dictionary 
for  H  if  V  posses  the  property  (G):  for  any  f  E  H  such  that 

o'mifi'D)  <  m~r ,  to  =  1,2,..., 

we  have 

\\f-Gm{f,V)\\  <C(r,V)m-r,  m  =  1,2,.... 

A  simple  example  of  greedy  dictionary  is  an  orthonormal  basis  for  H.  There  is  a 
nontrivial  classical  example  of  greedy  dictionary.  Let  II  be  a  set  of  functions  from 
L2([0,  l]2)  of  the  form  u(x i)u(:r2)  with  unit  L2-norm.  Then  for  this  dictionary  and 
H  =  L2([0,  l]2)  we  have  for  each  f  E  H 

ll/-Gm(/,n)||  =<7m(/,n). 

This  result  and  related  results  will  be  discussed  in  Section  11.  We  will  discuss  in 
Section  8  the  general  setting  for  Pure  Greedy  Algorithm  and  modifications  of  PGA 
some  of  which  we  define  now.  For  other  modifications  see  [DT2]  and  [DDGS]. 

Let  a  sequence  r  =  {tk}kLi,  0  <  tk  <  1,  be  given.  Following  [T20]  we  define 
Weak  Greedy  Algorithm. 

Weak  Greedy  Algorithm  (WGA).  We  define  /q  :=  /.  Then  for  each  m  >  1, 
we  inductively  define: 

1 ) .  pTm  EV  is  any  satisfying 

I  (fm—l  >  Trn)  I  >  An  SUp  |(/^_i,^)|; 

2) . 

/m  :=  fm- 1  -  (/m-1. 

S). 

m 

GTm(f,v)  ■■=YiUi-v<eWj- 

3=  1 

We  note  that  in  a  particular  case  tk  =  t,  k  =  1,2,...,  this  algorithm  was 
considered  in  [J 1] .  We  present  convergence  results  and  error  estimates  for  PGA 
and  WGA  in  Section  8. 
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Much  less  is  known  about  greedy  algorithms  in  the  case  of  Banach  space  X.  We 
discuss  here  two  versions  of  generalization  of  PGA  from  Hilbert  space  H  to  Banach 
space  X .  The  first  one  is  a  straightforward  generalization  of  PGA.  We  call  it  Pure 
Greedy  Algorithm  or  X-Greedy  Algorithm  when  we  want  to  point  out  a  Banach 
space.  For  a  given  X  and  V  we  define  G(f,D,X )  :=  a(f)g(f )  where  ot(f)  £  K  and 
g(f)  £  V  satisfy  (we  assume  existence)  the  relation 

min  ||/  -  ag\\  =  \\f  -  a{f)g(f)\\. 

Ot£K,g£T) 

X-Greedy  Algorithm.  We  define  Ro(f,V,  X)  :=  f  and  Go(f,V,  X)  :=  0.  Then, 
for  each  m  >  1 ,  we  inductively  define 

Rm(f)  :=  Rm(f,V,X)  :=  Rm-l(f)  G(Rm  1  (/) ,  V,  X) 

Gm(f ,  V,  X)  :=  V,  X)  +  G(Rm^(f),V,  X). 

The  second  version  of  PGA  in  Banach  space  is  based  on  the  concept  of  peak 
functional  (norming  functional).  We  call  it  Dual  Greedy  Algorithm  (DGA).  Let  a 
dictionary  D  in  X  be  given.  Take  an  element  /  £  X  and  find  a  peak  functional  Ff , 
i.e.  a  functional  such  that  ||F/||x'  =  1  and  Ff(f )  =  ||/||x-  The  existence  of  such  a 
functional  follows  from  the  Hahn-Banach  theorem.  Now  the  basic  step  of  PGA  is 
modified  to  the  following.  Assume  that  there  exists  gf  £  such  that 

\Ff{gf)\  =  max  \Ff(g)\. 

g£V 

We  take  this  gf  and  solve  one  more  optimization  problem:  find  a  number  a  such 
that 

\\f  -agfWx  =  min  \\f  —  bgf\\x- 

b 

We  put 

GD(/,P):=aS/,  RD(f,V):=f-agf. 

Repeating  this  step  m  times  we  get  G^f  f,  V)  as  an  approximant  and  R®(f,V)  as 
a  residual.  Some  results  on  greedy  algorithms  in  Banach  spaces  are  presented  in 
Section  9. 

In  Section  10  we  discuss  some  results  on  how  the  entropy  numbers  can  be  used 
in  estimating  from  below  the  quantities  {am(F,  \I/)}.  The  idea  of  estimating  the 
Kolmogorov  widths  from  below  using  the  entropy  numbers  is  well  known  (see  [L], 
[C],  [Pi]).  We  used  this  idea  in  [T16]  for  estimating  nonlinear  best  m- term  approx¬ 
imation.  We  proved  that  for  good  systems  T  the  estimate 

en(F,  X)  n~“(logn)6,  a  >  0,  h  £  M, 

for  the  entropy  numbers  implies  the  same  estimate  for  best  to- term  approximation: 

am(F,  T)x  »  to  ^ (log to)^  . 

See  Section  10  for  more  detail. 

We  mention  two  survey  papers  [Bab]  and  [D]  where  one  can  find  detailed  discus¬ 
sion  of  numerical  applications. 
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2.  Approximation  by  Linear  Methods.  Individual  functions 

Let  us  consider  a  Banach  space  X  with  a  basis  T  =  {ip^kLi,  llV’fcll  =  1,  k  = 
1,2, ... .  For  a  given  element  /  e  X  we  consider  the  expansion 

OO 

/  =  £cfcWfc 

k=i 


and  the  correspoding  partial  sums 


Sn(.f;  d')  :=  £cfe(/)^fe. 

k=i 

In  order  to  understand  efficiency  of  approximating  by  S'™  we  introduce  best  approx¬ 
imations  with  regard  to  Span{i/q, . . .  ,ipn}: 


En(f,y)x  ■=  inf  ||/ -  y2ak^k\\x- 

°k  k= l 

It  is  well  known  (see  [LT])  that  for  a  basis  T  the  operator  Sn  is  bounded  as  an 
operator  from  X  to  X.  Therefore,  we  have  for  any  f,g  6  X 

\\Sn(f)-Sn(g)\\x<C(X,*)\\f-9\\x, 

and  for  any  /  el 

\\f  -  Sn(f,*)\\x  <C(X,*)En(f,*)x. 

This  means  that  the  partial  sums  method  provides  near  best  approximation  for 
any  individual  /.  Let  us  consider  a  classical  example  of  T  =  T  -  the  trigonometric 
system  and  X  =  Lp ,  1  <  p  <  oo.  The  basis  T  is  an  orthonormal  basis  and, 
therefore,  orthoprojector  Sn  realizes  the  best  approximation  in  L-2 ■  By  the  Riesz 
theorem  (see  [Z])  we  know  that  T  is  a  basis  for  1  <  p  <  oo  and  thus  the  Fourier 
sums  realize  near  best  trigonometric  approximation  in  Lp,  1  <  p  <  oo.  It  is  well 
known  that  T  is  not  a  basis  for  L 1  and  L0 Q.  In  this  case  we  have  the  Lebesgue 
inequality: 


\\f-Sn(f,T)\\p<  CTn(n  +  2 )En(f,  T)p,  P  =  1,  oo. 

An  extra  factor  ln(2+n)  is  a  slowly  growing  to  infinity  function  on  n  but  nonetheless 
there  are  different  settings  where  an  attempt  to  get  rid  of  ln(2  +  n)  was  done.  We 
will  mention  some  of  them.  One  can  replace  the  partial  sum  Sn(f,  T)  by  the  de  la 
Vallee-Poussin  operator 


^  2n  1 

Vn(f,T):=-  Y,  SjU,T). 

j  =  n 


It  is  not  an  orthoprojector  anymore  but  one  has  the  estimate 
11/  -  Vn(f,  T)  ||P  <  4 En(f,  T)p,  p  =  1,  oo, 
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that  is  good  if  {En(f,T)p}  does  not  decrease  fast  (note  that  Vn(f,T)  is  a  trigono¬ 
metric  polynomial  of  degree  2 n  —  1).  The  following  estimate  was  obtained  by 
Oskolkov  [02]  for  p  =  oo 

2  n 

II/-S„(/,T)|U<C£ 

j=n 

We  note  also  that  in  the  case  of  p  =  oo  an  extra  ln(2  -f  n)  appears  not  only  in  the 
estimates  for  individual  functions  as  above  but  also  for  function  classes.  We  present 
here  some  well  known  results  for  the  Sobolev  classes 


EALTU 

j  —  n  +  1 


Wq  :=  {/  :  fl'r  ^ -absolutely  continuous,  H/^  ||g  <  1}. 

Kolmogorov  proved  that 

sup  11/  -  S„(/,T)||oo  =  "4 (In n)n~r  +  0(n~r). 

Favard,  Akhiezer  and  Krein  (see  [Tim])  proved  the  equality 

sup  En(f,T)OQ  =  Kr(n  +  l)-r, 

with  Kr  is  a  number  depending  on  the  number  r. 

We  discuss  an  interplay  between  approximation  of  individual  functions  and  func¬ 
tion  classes.  In  this  section  we  discuss  certain  aspects  of  the  following  question. 
Suppose  that  F  is  a  function  class  and  {<5n(i?)}^L1  is  a  corresponding  sequence  of 
extremal  quantities.  In  this  section  we  take  8n(F )  :=  sup^eF  Sn(f)  to  be  the  supre- 
mum  en(F)  or  En(F)  of  the  best  approximation  in  the  uniform  norm  of  functions 
in  F  by  algebraic  en(-)  or  trigonometric  En ( • )  polynomials  of  order  n.  In  Sec¬ 
tion  5  we  will  consider  the  case  8n(F)  =  dn(F )  -  the  sequence  of  the  Kolmogorov 
widths  of  the  class  F.  We  discuss  the  question  of  the  extent  to  which  the  sequence 
{<5n(I?)}^L1,  which  is  connected  with  the  whole  function  class  F,  characterizes  the 
corresponding  properties  of  individual  functions  in  F.  In  this  section  we  discuss 
the  question  of  the  existence  in  F  of  a  function  /  such  that 


lim  8n(f)/8n(F)  =  1. 

n—to o 

The  first  result  in  this  direction  is  apparently  due  to  Lebesgue.  In  [Le]  he  proved 
the  equality 

sup  En(f)0 o  =  M, 
imioo<M 

where  sup  is  taken  over  continuous  functions.  This  equality  in  combination  with 
the  Weierstrass  theorem  shows  that  in  the  class  of  all  continuous  functions  bounded 
by  the  number  M  there  is  no  asymptotically  extremal  function. 

Let  us  make  a  historical  remark  due  to  Nikol’skii  (see  [N2]).  S.N.  Bernstein 
discussed  the  role  of  function  classes  in  constructive  approximation  in  the  opening 
session  of  his  seminar  in  Approximation  Theory  (Moscow,  Spring  1945).  His  general 
attitude  to  the  role  of  studying  the  sequences  of  En(F)  :=  sup^eF  En(f )  for  a  given 
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function  class  F  was  skeptical.  One  of  his  arguments  was  that  the  sequence  {En(F)} 
may  not  reflect  the  behavior  of  {En(f)}  for  any  individual  f  E  F,  because  usually 
the  extreme  function  that  realizes  sup f€FEn(f)  depends  on  n.  He  formulated  a 
problem  of  studying 


r  £"(/)  a 
sup  inn  sup  and 

f£F  n— yoo  EnyF) 


sup  lim  inf 

j?  ^  jrp  Ti — y  oo 


EJJ) 
En(F ) 


and  their  analogs  for  approximation  by  algebraic  polynomials  for  some  function 
classes.  In  particular  he  thought  that  the  function  \x\  is  an  extremal  function  in 
the  sense  of  the  above  quantities  in  the  class  Lipil  for  approximation  by  algebraic 
polynomials  in  the  uniform  norm.  However,  it  turned  out  not  to  be  the  case.  S. 
M.  Nikol’skii  [N2]  proved  in  1946  that  for  classes  there  is  a  function  f  E 
such  that 

limsupEn(f)/En(Wf0)  =  l. 

n— >-00 

It  was  proved  in  [Tl],  [T2]  that  for  the  class  there  exists  a  function  f  E  W ^ 
such  that 


(2.1)  lim  En(f)/En(WZ>)  =  1. 

n— >-00 

Further  results  and  some  generalizations  are  obtained  in  [T3],  [T5].  It  is  interesting 
to  compare  the  above  result  (2.1)  with  the  following  result  of  Oskolkov  [01] 

max  lim  inf (||  f  -  Sn(f,T)\\oo/  sup  ||/ - Sn(f,  T)||oo)  =  1/2. 

f£W^  oo  f£W^ 

Open  problem  2.1.  Is  it  possible  to  extend  (2.1)  from  to  Wr Hu  with  arbi¬ 
trary  modulus  of  continuity  oj?  We  define  here 

WrH“-{f  :  |/MW-/Mfe)|<a)(|x-!/|),  x,yel}. 

3.  Greedy  Approximation  with  regard  to  bases 

3.1  Greedy  Bases.  We  will  study  the  algorithms  Gm(f,  \k,p)  defined  in  the  In¬ 
troduction.  In  order  to  understand  the  efficiency  of  this  algorithm  we  compare  its 
accuracy  with  the  best  possible  crm(/,  4/)  when  an  approximant  is  a  linear  com¬ 
bination  of  to  terms  from  4>.  The  best  we  can  achieve  with  the  algorithm  Gm 
is 

or  a  little  weaker 


(3.1)  \\f-Gm(f,y,p)\\  <G<rm(f,V) 

for  all  elements  f  E  X  with  a  constant  G  =  C(X,  T)  independent  of  /  and  to. 

Definition  3.1.  We  call  a  basis  T  greedy  basis  if  for  every  f  E  X  there  exists  a 
permutation  p  E  D(f)  such  that  (3.1)  holds. 

The  following  proposition  has  been  proved  in  [KoTl], 
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Proposition  3.1.  If  T  is  a  greedy  basis  then  (3.1)  holds  for  any  permutation 
p£D(f). 

We  will  discuss  two  the  most  interesting  cases  of  basis  T:  the  Haar  basis  TL 
as  a  representative  of  wavelet  type  bases  and  the  trigonometric  system  T  as  a 
representative  of  uniformly  bounded  orthonormal  bases. 

Denote  TLP  :=  the  Haar  basis  on  [0, 1)  normalized  in  Lp( 0, 1):  H\  =  1 

on  [0, 1)  and  for  k  =  2n +  l,  n  =  0, 1, . . . ,  l  =  1,  2, . . . ,  2n 

{2 n/p,  X  G  [(2/  -  2)2  ri1,  (21  -  l)2~n~1) 

-2u/p,  x  G  [(2/  —  1)2  ri1,  2Z2~n~1) 

0,  otherwise. 

Denote  by  T  :=  {elkx}kez  the  univariate  trigonometric  system  in  the  complex 
form  and  denote  by  Td  :=  T  x  •  •  •  x  T  the  multivariate  trigonometric  system. 

The  following  theorem  (see  [T14])  establishes  existence  of  greedy  bases  for  Lp(  0, 1), 
1  <  p  <  oo. 

Theorem  3.1.  Let  1  <  p  <  oo  and  a  basis  T  be  Lp- equivalent  to  the  Haar  basis 
Hp.  Then  for  any  f  G  Lp( 0, 1)  and  any  p  G  D(f )  we  have 

11/  -  Gm(f,  P)\\lp  <  C(p,  *)vm(f,  *)Lp 

with  a  constant  C(p ,  T)  independent  of  f ,  p,  and  m. 

We  use  in  this  theorem  the  following  definition  of  the  Lp-equivalence.  We  say 
that  T  =  is  Tp-equivalent  to  TLP  =  {Hp  }(^=1  if  for  any  finite  set  A  and  any 

coefficients  q,  1;  G  A,  we  have 

Ci(p,*)ll  <  II  <  c2(p,*)|| 

feeA  fceA  fceA 

with  two  positive  constants  C\ (p,  T),  ^(p,  T)  which  may  depend  on  p  and  T.  For 
sufficient  conditions  on  T  to  be  Lp-equivalent  to  TLP  see  [FJ]  and  [DKT]. 

Thus  each  basis  T  which  is  Lp-equivalent  to  the  univariate  Haar  basis  TLP  is  a 
greedy  basis  for  Lp( 0, 1),  1  <  p  <  oo.  We  note  that  in  the  case  of  Hilbert  space 
each  orthonormal  basis  is  a  greedy  basis  with  a  constant  G  =  1  (see  (3.1)). 

We  give  now  the  definitions  of  unconditional  and  democratic  bases. 

Definition  3.2.  A  basis  T  =  of  a  Banach  space  X  is  said  to  be  uncon¬ 
ditional  if  for  every  choice  of  signs  6  =  0k  =  1  or  —1,  k  =  1,2,...,  the 

linear  operator  Me  defined  by 

OO  OO 

k= 1  k= 1 

is  a  bounded  operator  from  X  into  X. 

Definition  3.3.  We  say  that  a  basis  T  =  {i/fcl/lTi  is  a  democratic  basis  if  for  any 
two  finite  sets  of  indices  P  and  Q  with  the  same  cardinality  ffP  =  ffQ  we  have 

II  V’fell  <  D\\  J^fcll 

k£P  k£Q 

with  a  constant  D  :=  D(X,  T)  independent  of  P  and  Q. 

We  proved  in  [KoTl]  the  following  theorem. 


12 


V.N.TEMLYAKOV 


Theorem  3.2.  A  basis  is  greedy  if  and  only  if  it  is  unconditional  and  democratic. 

We  remark  that  Definition  3.1  of  greedy  basis  for  a  Banach  space  is  an  analog 
of  Definition  1.2  (see  Introduction)  of  greedy  dictionary  for  a  Hilbert  space.  Let  us 
give  an  analog  to  Definition  1.3  of  r-greedy  dictionary. 

Definition  3.4.  We  call  a  basis  T  r-greedy  basis  for  a  Banach  space  X  if  for  each 
f  e  X  such  that 

<7m(f,i&)x  <mr,  to  =  1,2,..., 
we  have  for  every  p  e  D(f) 

||/-Gra(/,¥,p)||  <C(r^)m-r,  m  =  1,2,.... 

We  construct  the  following  example  now. 

Example  3.1.  There  exist  a  Banach  space  X  and  a  basis  T  such  that  T  is  a 
r-greedy  basis  for  X  for  any  r  >  0  and  ’L  is  not  an  unconditional  basis. 

Proof.  We  use  the  construction  from  [KoTl],  Let  X  be  the  set  of  all  real  sequences 
x  =  (xi,  X2,  ■  ■  ■ )  G  l2  such  that 


x 


sup 

JV6N 


N 

T:  Xn/Vn 

n= 1 


is  finite.  Clearly,  X  equipped  with  the  norm 


=  max 


n 


is  a  Banach  space.  Let  ipk  £  X,  k  = 

idPkfn 


1,  2, . . . ,  be  defined  as 

f  1,  n  =  k, 

\  0,  n  ^  k. 


We  take  any  r  >  0  and  prove  that  $  is  r-greedy  basis  for  X.  Indeed,  the  assumption 
am(f,  \&)x  <  to  r  implies  am(f ,  H/);2  <  to  r  and,  therefore, 


\\f~Gm(f,n\h  <to 


Let  us  prove  a  similar  estimate  for  ||  •  ||'.  Let 

Gm(/,$)  =  Ck(f)lfk. 

k£  Am 


Denote  Qm.{N)  :=  [1,1V]  \  Am.  Then 

OO 

11/ -  Gm(f,  4f)||/  =  sup  |  ^2  ck{f)k~1/2\  <  ^2k-1/2{m  +  kyr~1/2  ^m~r. 

N  k£Qm{N)  k=  1 

This  proves  that  $  is  a  r-greedy  basis  for  X .  It  is  proved  in  [KoTl]  that  T  is  not 
unconditional. 
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3.2.  The  Trigonometric  System.  Let  us  consider  nonlinear  approximation  with 
regard  to  the  trigonometric  system  Td ■  The  existence  of  best  m-term  trigonomet¬ 
ric  approximation  was  proved  in  [Ba]  (see  also  [T19]).  The  method  Gm(f )  := 
Gm{f,  Td)  has  one  more  advantage  over  the  traditional  approximation  by  trigono¬ 
metric  polynomials  in  the  case  of  approximation  of  functions  of  several  variables. 
In  this  case  (d  >  1)  there  is  no  natural  order  of  trigonometric  system  and  the 
use  of  Gm  allows  us  to  avoid  the  problem  of  finding  natural  subspaces  of  trigono¬ 
metric  polynomials  for  approximation  purposes.  We  proved  in  [T19]  the  following 
inequality. 

Theorem  3.3.  For  each  f  G  Lp(Td)  we  have 

11/  -  Gm(/)||P  <  (1  +  3 1  <  p  <  oo, 

where  h(p)  :=  1 1/2  —  l/p|. 

Remark  3.1.  For  all  1  <  p  <  oo 

l|Gm(/)llp  <  mft<^||/||p. 

Remark  3.2.  There  is  a  positive  absolute  constant  C  such  that  for  each  m  and 
1  <  p  <  oo  there  exists  a  function  f  /  0  with  the  property 

(3.2)  l|Gm(/)||p>Cm'‘M||/||p. 

The  above  results  show  that  the  trigonometric  system  is  not  a  greedy  basis  for 
Lp,  p  /  2.  This  leads  to  a  natural  attempt  to  consider  some  other  algorithms 
that  may  have  some  advantages  over  TGA  in  the  case  of  T.  We  discuss  here  the 
performance  of  WCGA  (see  Section  9)  with  regard  to  T. 

Let  us  compare  the  rate  of  approximation  of  TGA  and  WCGA  for  the  class 

A  :=  A(TIT)  where  TIT  denotes  the  real  trigonometric  system  1/2,  sinx,  cosx, _ 

We  need  to  switch  to  this  system  from  the  complex  trigonometric  system  because 
the  algorithm  WCGA  is  defined  for  the  real  Banach  space.  We  note  that  the  system 
ITT  is  not  normalized  in  Lp  but  quasinormalized:  C\  <  ||t||p  <  C2  for  any  t  G  TIT 
with  absolute  constants  Ci,  C2,  1  <  p  <  00.  It  is  sufficient  for  application  of  general 
methods  developed  in  Section  9.  For  a  sequence  r  :=  {tk}  with  tk  =  t,  k  =  1,2,..., 
we  replace  r  by  t  in  the  notation.  Theorem  9.1  and  (9.6)  imply  the  following  result. 

Theorem  3.4.  Let  0  <  t  <  1.  For  f  G  A  we  have 

(3.3)  ll/-G^(/,Rr)||p<C(p,«)rn.-1/2,  2<p<oo. 

This  estimate  and  Theorem  3.3  imply  that  for  /  G  A  we  have 

(3.4)  ||/  -  Gm(f,  TT)\\p  <  CMm-1/?,  2  <p  <  00, 

what  is  weaker  than  (3.3).  It  is  proved  in  [DKTe]  that  (3.4)  can  not  be  improved. 
Thus  the  WCGA  works  better  than  the  TGA  for  the  class  A.  We  note  that  the 
restriction  p  <  00  in  (3.3)  is  important.  We  give  now  a  lower  estimate  for  to- term 
approximation  in  Loo. 
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Proposition  3.2.  For  a  given  m  define 

2m 

f  :=  cos3fea:. 

k= 0 


Then  we  have 

(?m{f,T) oo  >  m/4. 


Proof.  Consider  the  Riesz  product 

$0(:r)  :=  (1  +  cos  3^ x)  —  1. 

je[0,2m] 

This  function  has  nonzero  Fourier  coefficients  only  with  frequences  of  the  form 

j(s) 

h  ( ^  7  3J  ,  S  (  So  i  i  ^2 rra)  j 

1=0 

with  0  <  j(s)  <  2m,  Sj  =  —1,  0, 1  for  j  <  j{s),  Sj (s)  =  1,  and  Sj  =  0  for  j(s )  <  j  < 
2m.  It  is  clear  that  k(s )  is  uniquely  defined  by  s.  Take  any  polynomial  of  the  form 

t(x)  =  afc  cos  kx ,  =  m. 

fee  A 

Then  for  each  fee  A  we  look  for  an  s  such  that  k  =  k(s).  If  we  do  not  find  such  an 
s  we  have 

(cos  kx,  $o)  =  0. 

For  those  s  that  were  found  to  satisfy  k(s)  =  k,  k  e  A,  we  form  a  set  J  consisting 
of  all  j(s )  and  define  the  new  Riesz  product 


$  :=  (1  +  cos  V x)  —  1. 

j£[0,2m]\J 


Then  we  have 


and 

m  +  1  <  (/ 

This  implies 


o 

C$)<||/-t||oo||$||i<4||/-t||00 

Vm{f,T)oo  >  m/4. 


3.3  Greedy  Bases.  Direct  and  Inverse  Theorems.  Theorem  3.1  points  out 
the  importance  of  bases  Lp-equivalent  to  the  Haar  basis.  We  will  discuss  now 
necessary  and  sufficient  conditions  for  /  to  have  a  prescribed  decay  of  {crm(f,  T)p} 
under  assumption  that  T  is  Lp-equivalent  to  the  Haar  basis  TLP,  1  <  p  <  oo.  We 
will  express  these  conditions  in  terms  of  coefficients  {fn}  of  the  expansion 

OO 

f  =  Y1  n ■ 

n= 1 

The  following  lemma  from  [T14]  plays  the  key  role  in  this  consideration. 
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Lemma  3.1.  Let  a  basis  be  Lp-equivalent  to  TLP,  1  <  p  <  oo.  Then  for  any  finite 
A  and  a  <  \cn\  <  b,  n  G  A,  we  have 

C1(p,f)  a(m'h  <  II  E  ^"llp  2  C2(p,>C)!,(#A)1/i'. 

U6  A 


We  formulate  a  general  statement  and  then  consider  several  important  partic¬ 
ular  examples  of  rate  of  decrease  of  {crm(f,  \&)p}.  We  begin  by  introducing  some 
notations.  For  a  monotonically  decreasing  to  zero  sequence  £  =  {ek}kLo  of  posi¬ 
tive  numbers  (we  write  £  G  MDP )  we  define  inductively  a  sequence  {Ns}fL o  °f 
nonnegative  integers: 

(3.5)  N0  =  0;  iVs+i  is  the  smallest  satisfying 

£Na+1  <  2  eNs]  ns  '■=  Ns+1  —  Ns. 

We  are  going  to  consider  the  following  examples  of  sequences. 

Example  3.2.  Take  eo  =  1  and  ek  =  k r  >  0,  k  =  1,2,....  Then 

Ns+ 1  =  [2 1/rNs]  +  1  and  ns  =  [2l'rNs\  +  1  -  Ns. 

What  implies 

Ns  x  2s/r  and  ns  x  2s/r. 


Example  3.3.  Fix  0  <  b  <  1  and  take  =  2  ,  k  =  0, 1,  2, _ Then 

Ns  =  s1/b  +  0(  1)  and  ns  x  s1/6-1. 

Let  /  G  Lp.  Rearrange  the  sequence  H/nVViHp  in  decreasing  order 

H/niVVii  ||p  ^  \\fn2'iPn2  lip  — 


and  denote 

ak{fiP )  :=  H/rifeV’nfe  ||p- 

We  give  now  some  inequalities  for  ak(f,p)  and  am(f ,  \&)p.  We  will  use  brief  notation 
<7m(/)p  :=  °m(f,  V)P  and  <70(/)p  :=  ||/||p. 

Lemma  3.2.  For  any  two  positive  integers  N  <  M  we  have 

aM(f,P )  <  C(p,  ^)aiv(/)p(M  - 


Lemma  3.3.  For  any  sequence  mo  <  mi  <  m2  <  . . .  of  nonnegative  integers  we 
have 

OO 

°mAf)P  <  C(p,^)^2ami(f,p)(mi+1  -mi)1/p. 

l=S 
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Theorem  3.5.  Assume  a  given  sequence  £  E  MDP  satisfies  the  conditions 

£Na>Ci2  s ,  ns+i  <  C2ns,  s  =  0,1,2, - 

Then  we  have  the  equivalence 

crn(/)P<en  aNs(f,p)  <^2~snf1/p. 

Corollary  3.1.  Theorem  3.5  applied  to  Examples  3.2,  3.3  gives  the  following  re- 


lations: 

(3.6) 

crm(/)p  <  (m  +  1)  r 

On(f,p)  «  n  r  1/p, 

(3.7) 

(Tm(f)p  «  2^  « 

an{f,p)<^2-nbn^1-1^/p 

Remark  3.3.  Making  use  of  Lemmas  3.2  and  3.3  we  can  prove  a  version  of  Corol¬ 
lary  3.1  with  the  sign  <s^  replaced  by  x. 

Theorem  3.5  and  Corollary  3.1  are  in  spirit  of  classical  Jackson-Bernstein  di¬ 
rect  and  inverse  theorems  in  linear  approximation  theory,  where  conditions  on  the 
corresponding  sequences  of  approximating  characteristics  are  imposed  in  the  form 

(3-8)  £/n(/)p  en ,  or  ||£/n(/)p/en||;oo  <  oo. 

It  is  well  known  (see  [D])  that  in  studying  many  questions  of  approximation  theory  it 
is  convenient  to  consider  along  with  restriction  (3.8)  the  following  its  generalization 

(3-9)  \\En(f)p/en\\iq  <  oo. 

Lemmas  3.2  and  3.3  are  also  useful  in  considering  this  more  general  case.  For 
instance,  in  the  particular  case  of  Example  3.2  one  gets  the  following  statement. 

Theorem  3.6.  Let  1  <  p  <  oo  and  0  <  q  <  oo.  Then  for  any  positive  r  we  have 
the  equivalence  relation 

<00  «=>  5X„(/,p)%">-1+’/>>  <  oo. 


Remark  3.4.  The  condition 


J2dn(f,p)qnrq-1+q/p<  oo 


with  $  =  /?:=  (r  +  1/p)  1  takes  a  very  simple  form 


(3.10) 


a™(f,p)p  =  11/"^*  1 1?  <  °°- 


In  the  case  =  TLP  the  condition  (3.10)  is  equivalent  to  f  is  in  Besov  space  B((Lp). 
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Corollary  3.2.  Theorem  3.6  implies  the  following  relation 

5>m(/,«)!V'5~1  <0o  ^  /  6  BpfLg), 

m 

where  j3  :=  (r  +  1/p)-1 . 

The  statement  similar  to  Corollary  3.2  for  free  knots  spline  approximation  was 
proved  by  P.  Petrushev  [P].  Corollary  3.2  and  further  results  in  this  direction  can 
be  found  in  [DP]  and  [DJP],  We  want  to  remark  here  that  conditions  in  terms 
of  an(f,p )  are  convenient  in  applications.  For  instance,  the  relation  (3.6)  can  be 
rewritten  using  the  idea  of  thresholding.  For  a  given  /  G  Lp  denote 

T(e)  :=  :  ak(f,p )  >  e}. 

Then  (3.6)  is  equivalent  to 

<TmU)P  «  (rn  +  l)-r  «  T(e)  «  e^r+1^~\ 

For  further  results  in  this  direction  see  [D],  [CDH],  [Os]. 

3.4  Stability.  In  this  section  we  assume  that  a  basis  T  =  {*/>k}kLi  is  an  uncondi¬ 
tional  normalized  (||V’fe||  =  1,  A:  =  1,2,...)  basis  for  X  (see  Definition  3.2). 

The  uniform  boundedness  principle  implies  that  the  unconditional  constant 

K:=K(X,V)  :=  sup  \\Mg\\ 
e 


is  finite. 

The  following  theorem  is  a  well  known  fact  about  unconditional  bases  (see  [LT], 

P-19). 

Theorem  3.7.  Let  T  be  an  unconditional  basis  for  X .  Then  for  every  choice  of 
bounded  scalars  we  have 

oo  oo 

II  ^2  AfcafcV’fcll  <  2 K  sup  |Afc|||  ^akipkW 

k=  1  k  k=  1 

(in  the  case  of  real  Banach  space  X  we  can  take  K  instead  of  2  K ). 

In  numerical  implementation  of  nonlinear  m-term  approximation  one  usually 
prefers  to  employ  the  strategy  known  as  thresholding  (see  [D,  S.7.8])  instead  of 
greedy  algorithm.  We  define  and  study  here  the  soft  thresholding.  Let  a  real 
function  v(x)  defined  for  x  >  0  satisfies  the  following  relations 


(3.11) 

,  for  x  >  1 

V  X  ~  {  0,  for  0<  x  <1/2, 

(3.12) 

u(s)  <  A,  x  G  [0, 1]; 
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there  is  a  constant  Cl  such  that  for  any  x,j/  6  [0,  oo)  we  have 

(3.13)  \v(x)  -  v(y)\  <  CL\x  -  y\. 

Let 

OO 

/  =  ^ck(f)xpk. 

k= l 

We  define  a  soft  thresholding  mapping  Te^v  as  follows.  Take  e  >  0  and  set 

Te,v(f)  :=^2v(\ck(f)\/e)ck(f)^k. 

k 

Theorem  3.7  implies  that 

(3.14)  l|T«,„(/)||  <2A'/t||/||. 

It  was  proved  in  [T21]  that  the  mapping  Te^v  satisfies  the  Lipschitz  condition  with 
a  constant  independent  of  e. 

Theorem  3.8.  For  any  e  and  any  functions  f,gEX  we  have 

WTeAf)  -  TeAd) II  <  (3 a  +  2CL)2K\\f  -  g\\. 


Open  problems. 

3.1.  Does  the  inequality 

\\f-G%(f,KT)\\p  <  Ci(p,  t)crn(f,  TZT)P 

hold  for  any  /  G  Lp( T),  1  <  p  <  oo,  with  m  <  ^(p,  t)nl 

3.2.  Does  the  inequality 

\\f-GcA(f,np)\\p<c1(p,t)an(f,np)p 

hold  for  any  /  G  Lp( 0, 1),  1  <  p  <  oo,  with  m  <  C*2(p,  f)n? 

3.3.  Find  the  order  of  the  quantity 

sup  \\f-GcA(f,UT)\\p,  l<p<oo. 
few; 

3.4.  Find  greedy  type  algorithm  realizing  near  best  approximation  in  the  Lp([ 0, 1] d) , 
1  <  p  <  oo,  d  >  2,  with  regard  to  FLp  for  individual  functions. 

4.  Some  Convergence  Results 

In  Section  3  we  discussed  greedy  bases.  That  is  justified  from  the  point  of  view 
of  efficient  approximation.  It  follows  from  Proposition  3.1  that  the  inequality 

(4-1)  ||GW/,*,P)II  <(G  +  1)||/|| 

holds  for  all  m  and  all  /  G  X  for  every  p  G  D(f). 
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Definition  4.1.  We  say  that  a  basis  4/  is  quasi- greedy  if  there  exists  a  constant 
Cq  such  that  for  any  f  G  X  and  any  finite  set  of  indices  A,  having  the  property 

(4.2)  min|cfc(/)|  >  max|cfe(/)|, 

k£ A  A 

we  have 

(4.3)  ||SA(/.  4)||  =  II  V  ck(!)ipk\\  <  Cgll/ll. 

fee  A 

It  is  clear  that  the  inequalities  (4.1)  and  (4.3)  are  equivalent.  P.  Wojtaszczyk 
[W2]  proved  that  a  basis  4'  is  quasi-greedy  if  and  only  if  the  sequence  {Gm(f,  4q  p)} 
converges  to  /  for  all  f  E  X  and  any  p  G  D(f).  We  constructed  in  [KoTl]  an 
example  of  quasi-greedy  basis  that  is  not  an  unconditional  basis  (and,  therefore, 
not  a  greedy  basis).  We  have  the  following  theorem  for  the  trigonometric  system. 

Theorem  4.1.  The  trigonometric  system  T  is  not  a  quasi-greedy  basis  for  Lp  if 
l>  /  2. 

This  theorem  has  been  proved  in  [T19]  and  for  p  <  2  it  has  been  proved  inde¬ 
pendently  and  by  different  method  in  [CF].  We  mention  here  that  the  method  from 
[T19]  gives  a  little  more  than  stated  in  Theorem  4.1. 

Theorem  4.2.  There  exists  a  continuous  function  f  such  that  Gm(f,T )  does  not 
converge  to  f  in  Lp  for  any  p  >  2. 

Theorem  4.3.  There  exists  a  function  f  that  belongs  to  any  Lp,  p  <  2,  such  that 
Gm(f,T )  does  not  converge  to  f  in  measure. 

The  proof  of  both  theorems  is  based  on  two  examples  (one  for  p  >  2  and  the 
other  for  p  <  2)  constructed  in  [T19,  pp  574  575].  We  prove  here  only  Theorem 
4.3  where  we  use  the  example  from  [T19]  for  p  <2. 

Proof  of  Theorem  f.3.  We  use  the  Rudin-Shapiro  polynomials  (see  [KS]) 

IV  — 1 

Rn{x)  =  ^2  ekZlkx ,  Cfc  =  ±1,  X  G  T, 

k=  0 

that  satisfy  the  inequality 

(4.4)  \\Rn\ |oo  <  CNl'\ 
with  an  absolute  constant  C.  Denote  for  s  =  ±1 

A  s(N):={k  :  RN(k)  =  s}. 

Denote  also 

Da(x)  :=Y,eikx. 

k£  A 


Then 


Rn  =  Da+1  —  D\_1. 
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The  inequality  (4.4)  implies 


WRnWi^^N1/2. 

Using  this  inequality  we  prove  that  there  exist  two  positive  constants  C\  and  C2 
such  that  for  one  of  s  =  ±1  we  have 

(4.5)  m{x  :  \DAs{N)(x)\  >  ciiV1/2}  >  c2. 

We  define  a  function  /  from  Theorem  4.3  as  follows 

OO 

/  :=  ^2-”/V2^(T>As(2.)  +s2-vR2„). 

V=1 

Then  for  appropriately  chosen  mi  and  m2  we  get 

Gm,(/,T)  -  Gm,(f,T)  =  2-”/2ei2”'(l  +  2 -”)Da.(2.) 


and  by  (4.5) 

m{x  :  \Gmi(f)  —  Gm2(f)\  >  Ci}  >  c2 

what  shows  that  {Gm(f,T)}  does  not  converge  in  measure.  Further,  for  any  1  < 
p  <  2  we  have 

\\Da3^)  +s2-vR2,\\p<C2vR-1^ 
what  implies  that  /  E  Lp. 

We  also  mention  two  interesting  results  on  convergence  almost  everywhere.  T.W. 
Korner  answering  a  question  raised  by  Carleson  and  Coifman  constructed  in  [Kl] 
a  function  from  L2  and  then  in  [K2]  a  continuous  function  such  that  {Gm(f,T)} 
diverges  almost  everywhere.  T.  Tao  [Ta]  proved  that  for  the  Haar  system  we  have 
convergence:  the  sequence  {Gm(f,'Hp)}  converges  almost  everywhere  to  /  for  any 
f  e  Lp,  1  <  p  <  00. 

Open  problems. 

4.1.  Does  Lp-Greedy  Algorithm  with  regard  to  T  converge  in  Lp ,  1  <  p  <  00, 
for  each  /  G  Lp(T)l 

4.2.  Does  Dual  Greedy  Algorithm  with  regard  to  T  converge  in  Lp ,  1  <  p  <  00, 
for  each  /  E  Lp( T)? 

4.3.  Does  Lp-Greedy  Algorithm  with  regard  to  Rp  converge  in  Lp ,  1  <  p  <  00, 
for  each  /  E  Lp( 0,1)? 

4.4.  Does  Dual  Greedy  Algorithm  with  regard  to  T-Lp  converge  in  Lp,  1  <  p  <  00, 
for  each  /  E  Lp( 0,1)? 

5.  Widths.  Optimal  methods  in  Linear  Approximation 

In  Sections  2  and  3  a  basis  T  was  chosen  a  priori.  In  many  problems  when  an 
application  to  physical  or  engineering  problems  dictates  the  choice  of  a  basis  it  is 
the  case.  However,  in  many  other  problems  we  can  choose  an  appropriate  basis  for 
approximation.  This  leads  to  a  search  for  optimal  bases  of  approximation.  The 
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first  result  in  this  direction  was  obtained  by  Kolmogorov.  In  1936  Kolmogorov 
introduced  the  concept  of  width  of  a  class  F  in  a  space  X  : 


dn(F,  X)  =  inf  sup  inf  1 1 /  -  ^  cj <t>j  \\x- 
{^}"=i  fSF{cj}?=1 

This  concept  allows  us  to  find  for  fixed  n  and  for  a  class  F  a  subspace  of  dimension  n, 
optimal  with  respect  to  the  construction  of  an  approximating  element  as  the  element 
of  best  approximation.  In  other  words,  the  concept  of  width  allows  us  to  choose 
among  various  Chebyshev  methods  having  the  same  quantitative  characteristic  of 
complexity  (the  dimension  of  the  approximating  subspace)  the  one  which  has  the 
greatest  accuracy.  The  first  result  about  widths,  namely  Kolmogorov’s  result  (1936) 

d2n+1(W;,L2)  =  (n  +  l)-r, 


showed  that  the  best  subspace  of  dimension  2n  +  1  for  approximation  of  classes  of 
periodic  functions  is  the  subspace  of  trigonometric  polynomials  of  order  n.  This 
result  confirmed  that  the  approximation  of  functions  in  the  class  WJ  by  trigonomet¬ 
ric  polynomials  is  natural.  Further  estimates  of  the  widths  d2n+i(Wq ,  Lp),  1  <  q, 
p  <  oo,  some  of  which  are  discussed  here,  showed  that  for  some  values  of  the  pa¬ 
rameters  q ,  p  the  subspace  of  trigonometric  polynomials  of  order  n  is  optimal  (in 
the  sense  of  order)  but  for  other  values  of  q,p  this  subspace  is  not  optimal. 

The  Ismagilov  [I]  estimate  for  the  quantity  dn(W\ .  L^)  gave  the  first  example 
where  the  subspace  of  trigonometric  polynomials  of  order  n  is  not  optimal.  This 
phenomenon  was  thoroughly  studied  by  Kashin  [Kal]. 

In  analogy  to  the  concept  of  Kolmogorov  width,  that  is,  to  the  problem  concern¬ 
ing  the  best  Chebyshev  method,  the  problems  concerning  the  best  linear  method 
and  the  best  Fourier  method  were  considered. 

Tikhomirov  [Ti]  introduced  the  concept  of  linear  width  : 


A  n(F,X) 


inf 

A:rank  A<n 


sup  11/  -  Af\\x, 

feF 


and  the  concept  of  orthowidth  (Fourier  width)  was  introduced  in  [T4] 


Vn(F,X)  :=di(F,X)  := 


inf  sup 

orthonormal  system  f^F 


f  -  ^2(f,  ui)ui 


i—  1 


X 


All  these  widths  have  as  a  starting  point  a  function  class  F.  Thus  in  this  setting  we 
choose  a  priori  a  function  class  F  and  look  for  optimal  subspaces  for  approximation 
of  a  given  class.  The  following  results  are  well  known  [Te2].  We  present  these 
results  for  r  positive  integer.  Similar  results  hold  for  any  r  greater  than  some 
a(q,p )  <  1,  which  is  defined  below  in  Theorem  5.1.  Positive  integers  satisfy  the 
inequality  r  >  ot(q,p)  for  all  1  <  q,p  <  oo,  except  q  =  1,  p  =  oo  where  we  have 
ai(l,  oo )  =  1.  Thus  in  the  case  q  =  1,  p  =  oo  we  assume  r  >  1. 

A.  In  the  case  l<p<q<ooovl<q<p<2  one  has 


<Pn(Wrq,Lp) 


A  n(Wrq,Lp) 


dn{Wrq,Lp ) 


n 


-H-(l/«-l/p)  + 


(5.1) 
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B.  In  the  case  1  <  q  <  p  <  oo,  p  >  2,  one  has 

A  n{Wrq,Lp)  X  n-r+max(l/,-l/2, 1/2-1/p) } 

In  the  case  A  the  classical  trigonometric  system  provides  the  optimal  orders  for  all 
widths,  except  <pn  for  q  =  p  =  1,  oo.  Let  us  discuss  a  more  interesting  case  B  for  a 
particular  choice  of  q  =  2  and  p  =  oo.  We  have 

(5.2)  i(lf2r,foc)xn-r, 

(5.3)  An(WJ,L00)  x  <^(W2r,Loo)  x  n^+1/2. 

These  relations  show  that  if  we  drop  the  linearity  requiment  for  approximation 
method  we  gain  in  accuracy  a  factor  tT1!2  .  However,  there  is  a  big  difficulty  in 
realization  of  the  estimate  (5.2).  We  know  by  Kashin’s  result  that  there  exists  a 
subspace  realizing  (5.2)  but  we  do  not  know  a  way  to  construct  it.  Thus  it  is  only 
an  existence  theorem  for  now. 

Let  us  discuss  one  more  special  case  q  =  1  and  p  =  oo.  In  this  case  we  have 

(5.4)  dn(W[,  Loo)  x  \n(W[,  Loo)  x  n-r+1/2 
and 

(5.5)  <MWT,Loo)  -n-r+1. 

Therefore,  by  (5.4)  the  best  possible  approximation  (in  the  sense  of  order)  can  be 
realized  by  linear  method,  say,  An.  However,  by  (5.5)  this  linear  method  An  is 
certainly  not  an  orthogonal  projector.  Moreover,  by  [Te2]  it  can  not  satisfy  even 
the  following  much  weaker  restriction  ||An(e*fea;)||2  <  C,  k  e  Z.  This  means  that 
the  optimal  linear  operator  An  is  unstable.  A  small  change  in  some  of  Fourier 
coefficients  of  /  may  result  in  a  big  change  of  ||An(/)||2. 

Let  us  make  some  conclusions  now.  In  Linear  Approximation  of  in  Lp  the 
bottom  line  is  given  by  pn(Wq,  Lp)  where  the  approximation  method  is  the  simplest 
orthogonal  projection.  Partial  sums  with  regard  to  classical  systems  provide  an 
optimal  error  of  approximation  for  this  width.  The  trigonometric  system  works  for 
all  1  <  q,p  <  oo  except  ( q,p )  =  (1, 1),  (oo,  oo).  The  wavelet  systems  (see  [AT]) 
work  for  all  1  <  q,p  <  oo.  On  the  example  of  the  pair  (W[ ,  L^)  we  have  seen  that 
we  need  to  sacrifice  important  and  convenient  properties  of  approximating  operator 
in  order  to  achieve  better  accuracy.  On  the  example  of  (W%,  L^)  we  have  seen  that 
we  need  to  pay  even  bigger  price  for  better  accuracy  in  a  form  of  proving  only  an 
existence  theorem  instead  of  providing  a  constructive  method  of  approximation. 

Let  us  continue  the  discussion  from  Section  2  on  interplay  between  approximation 
of  individual  functions  and  function  classes.  Let  us  first  try  to  associate  with  an 
individual  function  /  a  sequence  of  the  Kolmogorov  widths.  It  is  clear  that  the 
choice  F[f]  :=  {/}  does  not  work  because  d\(F[f])  =  0  for  each  /.  The  idea  is  to 
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find  a  minimal  reasonable  class  that  contains  /.  In  the  periodic  case  it  is  natural  to 
associate  with  f(x)  all  translates  f(x  —  y).  Thus  define  F[f]  :=  {f(x  —  y),  y  G  T}. 
All  known  classes  of  periodic  functions  are  shift  invariant.  In  such  a  case  we  have 
for  f  E  F  that  F[f]  C  F  and  dn(F[f],X)  <  dn(F,X).  We  will  present  some  results 
from  [T5] .  For  r  G  Z+,  a  G  M+  denote 

Wr Hq  :=  {/  :  f —  absolutely  continuous, 

ll/MW-/(r)fa)||,  <\x-y\“,  x,y£  T}. 

Theorem  5.1.  Let  l<q<p<ooor2<p<q<  oo.  Then  each  class  WrH 
with  0  <  a  <  1  and  r  T  a  >  a(q,p )  contains  a  function  f  such  that 

liminf  dn(F[f],Lp)/dn(WrH*Lp)  >  0. 

n—too 

We  define  here  a(q,p)  :=  (1  jq  —  l/p)+  for  l<q<p<2,2<p<q<oo  and 
a(q,p )  :=  max(l/g,  1/2)  for  1  <  q  <  p  <  oo,  p  >  2. 

Let  us  consider  one  particular  case  q  =  p  =  oo,  a  =  1,  that  is  not  covered  by  The¬ 
orem  5.1.  As  established  by  Tikhomirov  [Ti],  the  values  of  the  Kolmogorov  width 
in  this  case  are  given  by  approximations  by  trigonometric  polynomials.  Results  of 
Nikol’skii  and  the  author  mentioned  in  Section  2  show  that  each  class  Wf/  contains 
a  function  asymptotically  extreme  for  the  best  approximation  by  trigonometric 
polynomials.  It  turns  that  the  picture  is  different  for  the  asymptotic  behavior  of 
the  widths  d„(F[f],L0 Q). 

Theorem  5.2.  Any  function  f  G  W70 c,  r  >  1/2  satisfies 

dn(F[f],L00)  =  o(dn(Wf0,L00)). 

It  is  intersting  to  note  that  for  any  periodic  function  /  G  Lp( T)  we  have 

(5.6)  (Tm(f(x  -  y),R)p,oo  =  dm(F[f],Lp )  <  (Tm{f ,  T)p. 

It  is  proved  in  [T5]  that  for  1  <  q  <  p  <  oo  one  has 

(5.7)  d%d(WrH«,Lp):=  sup  dm(F[f],Lp)^ 

f£WrH“ 

dm{WrH*,Lp)  X  m-r-a+(l/,-max(l/2,l/p))  + 

provided  r  +  a  >  ot(q,p)  with  a(q,p)  defined  in  Theorem  5.1.  We  proved  in  [DTI] 
that 

(5.8)  am(WrH T)p  X  m-r-a+(l/9-max(l/2,l/p))  + 

under  the  same  assumption  r  +  a  >  a(q,p).  Relations  (5.7)  and  (5.8)  show  that 
for  any  pair  of  (q,p),  1  <  q  <  p  <  oo,  and  for  each  function  /  G  WrH the 
trigonometric  system  T  provides  a  subspace  T( A),  ^A  <  m,  with  frequences  in  A 
such  that 


dm(T[/],Lp)  <  sup  inf 

j/GT  tGT(A) 


ll/(- 


Open  problems. 

5.1.  Construct  a  subspace  realizing  (5.2). 

5.2.  Does  there  exist  /  G  Wr H^,  0  <  a  <  1,  such  that 


dn{F[f],  Li)  » 
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6.  Optimal  Methods  in  Nonlinear  Approximation 

In  the  widths  problem  of  Linear  Approximation  we  were  looking  for  an  optimal  n- 
dimensional  subspace  for  approximating  a  given  function  class.  A  nonlinear  analog 
of  this  setting  is  the  following.  Let  a  function  class  F  and  a  Banach  space  X 
be  given.  Assume  that  on  the  base  of  some  additional  information  we  know  that 
our  basis  for  to- term  approximation  should  satisfy  some  structural  properties,  for 
instance,  has  to  be  orthogonal.  Then  similarly  to  the  setting  for  the  widths  dn , 
Xn,  ipn  we  get  the  optimization  problems  for  m-term  nonlinear  approximation  (see 
Introduction).  Let  B  be  a  collection  of  bases  satisfying  a  given  property. 

I.  Define  an  analog  of  the  Kolmogorov  width 

am(F,M)x  ■■=  inf  sup  am  ( /,  T )  x . 

II.  Define  an  analog  of  the  orthowidth 

7m(F,B)x  :=  inf  sup  ||/ -  Gm(/,  T)||x. 

'LeB  f£F 

We  present  here  some  results  in  the  case  B  =  O  -  the  set  of  orthonormal  bases, 
F  =  W£,  X  =  Lp,  1  <  q,p  <  oo.  First  of  all  we  formulate  a  result  (see  [KT1], 
[T18])  that  shows  that  in  the  case  p  <  2  we  need  some  more  restrictions  on  B  in 
order  to  get  meaningful  results  (lower  bounds). 

Proposition  6.1.  For  any  1  <  p  <  2  there  exists  a  complete  in  £2(0, 1)  orthonor¬ 
mal  system  $  such  that  for  each  f  E  Lp( 0, 1)  we  have  cii(/,  $)p  =  0. 

Let  us  restrict  our  further  discussion  to  the  case  p  >  2.  This  case  was  also  more 
interesting  in  the  Linear  Approximation  discussion  (see  Section  5).  Kashin  [Ka2] 
proved  that 

(6.1)  am(W^,  0)2  »  m  r. 

We  proved  (see  [DTI])  that 

(6.2)  <Tm(WZ,T) 00  «  m~r . 

The  estimates  (6.1)  and  (6.2)  imply  that  for  2  <  q,p  <  00  we  have 

(6.3)  am(Wgr,  0)p  x  am(W;,  T)p  x  m  r. 

Let  us  compare  this  relation  with  (5.2).  We  see  that  best  m-term  trigonometric  ap¬ 
proximation  provides  the  same  accuracy  as  the  best  approximation  from  an  optimal 
m-dimensional  subspace.  An  advantage  of  nonlinear  approximation  here  is  that  we 
use  a  natural  basis  instead  of  existing  but  nonconstructive  subspace.  However,  we 
should  note  that  the  estimate  (6.2)  was  proved  in  [DTI]  as  an  existence  theorem. 
We  did  not  give  an  algorithm  to  get  (6.2)  in  [DTI]  and  do  not  know  it  now.  The 
Thresholding  Greedy  Algorithm  does  not  provide  the  estimate  (6.2).  We  have  (see 
[T19]) 

sup  11/  -  Gm(f,T)\\oo  X  TO  r+1/2. 

/eWT 
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It  is  known  from  different  results  (see  [DJP],  [D],  [T21])  that  wavelets  are  well 
designed  for  nonlinear  approximation.  We  present  here  one  general  result  in  this 
direction.  We  consider  a  basis  T  :=  {V’/j/eD  enumerated  by  dyadic  intervals  I 
of  [0,l]d,  I  =  Ii  X  ■  •  •  x  Id,  Ij  is  a  dyadic  interval  of  [0,1],  j  =  1, . . . ,  d ,  which 
satisfies  certain  properties.  Let  Lp  :=  LP(Q)  with  normalized  Lebesgue  measure 
on  Q,  |fi|  =  1.  First  of  all  we  assume  that  for  all  1  <  q,p  <  oo  and  I  G  D, 
D  :=  D([0,  l]d)  is  the  set  of  all  dyadic  intervals  of  [0,  l]d,  we  have 

(6.4)  ll*llPx||*ll«|/r/'-1/«, 

with  constants  independent  of  I.  This  property  can  be  easily  checked  for  a  given 
basis. 

Next,  assume  that  for  any  s  =  (si, . . . ,  Sd)  G  Zd ,  Sj  >  0,  j  =  1, . . . ,  d,  and  any 
{c/}  we  have  for  1  <  p  <  oo 

(6-5)  II  ci^i\\p  x  llc^/||p, 

l€Ds  l£Ds 

where 


Ds  :=  {I  =  hx  ■■■  x  Id  e  D  :  [1^=2  j  =  l,...,d}. 

This  assumption  allows  us  to  estimate  the  Lp- norm  of  a  dyadic  block  in  terms  of 
Fourier  coefficients. 

The  third  assumption  is  that  T  is  a  basis  satisfying  the  Littlewood-Paley  in¬ 
equality.  This  means  the  following.  Let  1  <  p  <  oo  and  /  G  Lp  has  an  expansion 

/  =  Ytfrti- 

i 

We  assume  that 


(6.6) 


and 


.  lim  11/ 

lllj  /Zj  — >■  oo 


^//V,iiip  =  °, 

&  j  ^ =  I £ D s 


(6U  ll/ll,  x  IKE  I E  W)1/2n». 

s  l£Da 

Let  p  G  Zd,  /ij  >  0,  j  =  1, ...  ,d.  Denote  by  vk(/i)  the  subspace  of  polynomials  of 
the  form 

c/Vh- 

s j  ^ (ij  = 1  j •  •  •  jd  I £DS 

We  define  now  a  function  class.  Let  R  =  (Ri, . . . ,  Rd),  Rj  >  0,  j  =  1, ...  ,d,  and 


g(R)  :=(E  RJ1)-1- 

3  = 1 


For  natural  numbers  l  denote 


^(R,l)  :=  V(p),  pj  =  [g(R)l/Rj],  j  =  l,...,d. 

We  define  the  class  iL^(T)  as  the  set  of  functions  /  G  representable  in  the  form 

OO 

/  =  E*I.  IN, 

Z=1 
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Theorem  6.1.  Let  1  <  q,p  <  oo  and  g(R)  >  (1  /q  —  1  /p)+.  Then  for  W  satisfying 
(6.4)  (6.7)  we  have 


sup  \\f-G%{f,  W)||p  9{r). 


In  the  periodic  case  the  following  basis  Ud  :=  U  x  •  •  •  x  U  can  be  taken  in  place  of 
T  in  Theorem  6.1.  We  define  the  system  U  :=  {E/j}  in  the  univariate  case.  Denote 


2—1 


t/+M  ~  £ 


A2nx 


-  1 


k= 0 


eix  _  l 


n  =  0,1,2,...; 


U+k(x)  :=  ei2nxU+(x  -  2vrfc2  -),  k  =  0, 1, . . . ,  2"  -  1; 

U~k(x)  :=  e-i2nxU+{-x  +  2vrfc2-n),  k  =  0, 1, . . . ,  2"  -  1. 

We  normalize  the  system  of  functions  {U(fk,  U~  k}  in  L-2  and  enumerate  it  by  dyadic 
intervals.  We  write 


UT{x)  :=  2-n/2U+k{x)  with  /=  [(fc  +  1/2)2-",  (fc  + 1)2-"); 
UI(x):=2-n/2U-k{x)  with  I  =  [k2-n,{k  +  l/2)2-n); 

and 

U[ o,i)  (a:)  :=  1- 

It  is  well  known  that  H^(Ud)  is  equivalent  to  the  standard  anisotropic  mul¬ 
tivariate  periodic  Holder-Nikol’skii  classes  NHp.  We  define  these  classes  in  the 
following  way.  The  class  NHp,  R  =  (R\, . . . ,  Rd)  and  1  <  p  <  oo,  is  the  set  of 
periodic  functions  /  G  Lp([0,  2ir]d)  such  that  for  each  lj  =  [Rj]  +  1,  j  =  1, . . . ,  d, 
the  following  relations  hold 

(6-8)  ||/||P<1,  l|A^j/||p<|t|^,  j  =  l,...,d, 

where  A Y  is  the  /-th  difference  with  step  t  in  the  variable  Xj.  In  the  case  d  =  1 
NHp  coincides  with  the  standard  Holder  class  Hp.  Then  Theorem  6.1  gives. 

Theorem  6.2.  Let  1  <  q,p  <  oo;  then  for  R  such  that  g{R)  >  (1/q  —  l/p)+  we 
have 

sup  1/  —  Gmp(/,  Ud)\\p  « 

f&NH « 

We  also  proved  in  [T21]  that  the  bais  Ud  is  an  optimal  orthonormal  basis  for 
approximation  of  classes  NH ^  in  Lp: 

(6.9)  am(NH f,  0)p  x  am(NHqR,  Ud)p  x  m 

for  1  <  q  <  oo,  2  <  p  <  oo,  g{R)  >  (1  jq  —  1  /p)+.  It  is  important  to  remark  that 
Theorem  6.2  guaranties  that  the  estimate  in  (6.9)  can  be  realized  by  TGA  with 
regard  to  Ud. 

Open  problem  6.1.  Find  a  constructive  proof  of  (6.2)  (provide  an  algorithm). 
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7.  Universality 

In  this  section  we  discuss  in  the  model  case  of  anisotropic  function  classes  a  gen¬ 
eral  approach  formulated  in  Introduction  of  how  to  choose  a  good  basis  (dictionary) 
for  approximation.  This  approach  consists  of  several  steps.  We  concentrate  here 
on  nonlinear  approximation  and  compare  realizations  of  this  approach  for  linear 
and  nonlinear  approximations.  The  first  step  in  this  approach  is  an  optimization 
problem.  In  both  cases  (linear  and  nonlinear)  we  begin  with  a  function  class  F  in 
a  given  Banach  space  X .  A  classical  example  of  optimization  problem  in  the  linear 
case  is  the  problem  of  finding  (estimating)  the  Kolmogorov  width  dm(F,X).  This 
concept  allows  us  to  choose  among  various  Chebyshev  methods  (best  approxima¬ 
tion)  having  the  same  dimension  of  the  approximating  subspace  the  one  which  has 
the  best  accuracy.  The  asymptotic  behavior  (in  the  sense  of  order)  of  the  sequence 
{dm(F,  -X’)}“_1  is  known  for  a  number  of  function  classes  and  Banach  spaces.  It 
turned  out  that  in  many  cases,  for  instance,  in  the  case  F  =  Wp  is  a  standard 
Sobolev  class  and  X  =  Lp,  the  optimal  (in  the  sense  of  order)  m-dimensional  sub¬ 
spaces  can  be  formed  as  subspaces  spanned  by  to  elements  from  one  orthogonal 
system.  We  describe  this  for  the  multivariate  periodic  Holder-Nikol’skii  classes 
NHp.  It  is  known  (see  for  instance  [Te2])  that 

(7.1)  dm(NHp,  Lp)  x  777,  sr(-R) ,  1  <  p  <  oo. 

It  is  also  known  that  the  subspaces  of  trigonometric  polynomials  T(R,l )  with  fre¬ 
quences  k  satisfying  the  inequalities 

\kj\  <29(WRU  j  =  l,...,d, 

can  be  chosen  to  realize  (7.1).  In  this  case  l  is  set  to  be  the  largest  satisfying 
dim  T(R,l)  <  to.  We  stress  here  that  optimal  (in  the  sense  of  order)  subspaces 
T(R,  l)  are  different  for  different  R  and  formed  from  the  same  (trigonometric) 
system. 

A  nonlinear  analog  of  the  Kolmogorov  m-width  setting  was  discussed  in  Section 
6.  In  this  section  we  consider  only  the  case  D  =  O  the  set  of  all  orthogonal  bases 
on  a  given  domain.  In  Section  6  we  mentioned  that 

(7.2)  arn(NH^,0)Lp^m-9^ 
for 

l<q<oo,  2  <  p  <  oo,  g(R)  >  (1/q  -  l/p)+. 

It  is  important  to  remark  that  the  basis  Ud  realizes  (7.2)  for  all  R  (see  the  definition 
of  Ud  in  Section  3). 

The  second  step  in  our  approach  is  to  look  for  a  universal  basis  (dictionary) 
for  approximation.  The  mentioned  above  result  on  the  basis  Ud  means  that  Ud  is 
universal  for  the  pair  (Fq([A,  B]),  O)  and  the  space  X  =  Lp([ 0,  2iv]d)  for 
such  that  g(A)  >  (1/q  —  1  /p)+,  1  <  g  <  oo,  2  <  p  <  oo,  where 

Fq([A,  B])  :=  {NHy  :  0  <  Aj  <  Rj  <  Bj  <  oo,  j  =  1, . . . ,  d}. 

It  is  interesting  to  compare  this  result  on  universal  bases  in  nonlinear  approximation 
with  the  corresponding  result  in  the  linear  setting.  We  define  the  index  n(m,  F,  X) 
of  universality  for  a  collection  F  with  respect  to  the  Kolmogorov  width  in  X: 

n(m,  F,  X)  :=  L(m,F,X)/m, 
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where  L(m,  F ’,  X )  is  the  smallest  number  among  those  L  for  which  there  is  a  system 
of  functions  such  that  for  each  F  E  T  we  have 


L 


It  is  proved  in  [T8]  (see  also  [Te2,  Ch.3,  S.5])  that  for  any  A,  B  E  such  that 
Bj  ^ >  Aj 5  j  —  1?  .  .  .  5  WG  tlclVG 

(7.3)  k(to,  JFp([A,  13]),  Lp)  3>  (logm)d_1,  1  <  p  <  oo. 

The  estimate  (7.3)  implies  that  there  is  no  Chebyshev  methods  universal  for  a 
nontrivial  collection  of  anisotropic  function  classes.  Thus,  from  the  point  of  view 
of  existence  of  universal  methods  the  nonlinear  setting  has  an  advantage  over  the 
linear  setting. 

After  two  steps  of  realizing  our  approach  in  the  nonlinear  approximation  we  get  a 
universal  dictionary  Vu  for  a  collection  of  function  classes  T ,  say,  Ud  for  Fq([A,  B]). 
This  means  that  the  dictionary  Vu  is  well  desinged  for  best  to- term  approximation 
of  functions  from  function  classes  in  the  given  collection.  The  third  step  is  to  find 
an  algorithm  (theoretical  first)  to  realize  best  (near  best)  to- term  approximation 
with  regard  to  Du.  It  turned  out  that  in  the  model  case  of  Tq{[A,  B])  and  the  basis 
Ud  there  is  a  simple  algorithm  which  realizes  near  best  to- term  approximation  for 
classes  NH^ .  This  is  Thresholding  Greedy  Algorithm  (see  Theorem  6.2). 

Thus  we  have  established  that  in  the  above  model  case  the  basis  Ud  is  optimal  for 
nonlinear  to- term  approximation  in  a  very  strong  sense.  The  following  two  features 
of  Ud  are  the  most  important:  1)  Ud  is  the  tensor  product  of  the  univariate  basis 
U]  2)  the  univariate  basis  U  is  a  wavelet  type  basis.  It  is  known  [Wl]  that  U  is 
Lp-equi valent,  1  <  p  <  oo,  to  the  Haar  basis.  Then  by  Theorem  3.1  U  is  a  greedy 
basis  for  Lp,  1  <  p  <  oo.  The  tensor  product  structure  of  Ud  is  important  in 
making  Ud  a  universal  basis  for  a  collection  of  anisotropic  Holder-Nikol’skii  classes. 
It  would  be  ideal  if  Ud  is  a  greedy  basis  for  Lp(Td ),  1  <  p  <  oo.  Unfortunately,  it 
is  not  a  case.  We  have  that  for  1  <  p  <  oo 

(7.4)  sup  H/-GP  (/, Ud)\\r/t7m(f,Ud)p  x  (logm)(‘,-1>V2-1/l,l. 

This  relation  follows  from  its  analog  with  Ud  replaced  by  the  multivariate  Haar 
system  Hd  :=  x  •  •  ■  x  The  lower  estimate  in  (7.4)  for  Hd  was  proved  by 
R.  Hochmuth;  the  upper  estimate  in  (7.4)  for  T-Ld  was  proved  in  the  case  d  =  2, 
4/3  <  p  <  4,  and  was  conjectured  for  all  d,  1  <  p  <  oo,  in  [T15] .  The  conjecture 
was  proved  in  [W2], 

8.  Greedy  Algorithms  in  Hilbert  spaces 

Perhaps  the  first  example  of  to- term  approximation  with  regard  to  redundant 
dictionary  was  considered  by  E.  Schmidt  in  1907  [S]  who  considered  the  approxi¬ 
mation  of  functions  f(x,y )  of  two  variables  by  bilinear  forms 

m 

J ^Ui(x)vi(y ) 
i= 1 
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in  L2([0,1]2). 
operator 


This  problem  is  closely  connected  with  properties  of  the  integral 


f{x,y)g(y)dy 


with  kernel  f(x,y).  E.  Schmidt  [S]  gave  an  expansion  (known  as  the  Schmidt 
expansion) 


(8.S)  f(x,y )  =  '%2sj(Jf)<l>j(x)il>j(y) 

3= 1 


where  { Sj  (•-//)}  is  a  nonincreasing  sequence  of  singular  numbers  of  J/,  i.e.  Sj(Jf)  := 
A {A>(-4)}  is  a  sequence  of  eigenvalues  of  an  operator  A,  JJ  is  the 
adjoint  operator  to  Jf.  The  two  sequences  {4>j{x)}  and  {ipj (y)}  form  orthonormal 
sequences  of  eigenfunctions  of  the  operators  JfJJ  and  JJJf  respectively.  He  also 
proved  that 

m  m 

\\f{x,y)-^sj{Jf)(t)j(x)3pj{y)\\L2=  inf  \\f{x,y)-^uj(x)vj(y)\\L2. 

— '  Uj,Vj£L2,  — ' 

3= 1  J=1 

It  follows  from  the  Schmidt  expansion  that  the  above  best  bilinear  approximation 
can  be  realized  by  the  Pure  Greedy  Algorithm.  This  was  observed  and  used  in 
several  papers  (see  [Po]  for  history). 

Another  problem  of  this  type  which  is  well  known  in  statistics  is  the  projection 
pursuit  regression  problem.  We  formulate  the  related  results  in  the  function  theory 
language.  The  problem  is  to  approximate  in  L2  a  given  function  /  e  L2  by  a  sum 
of  ridge  functions,  i.e.  by 

m 

^rjiuj-x),  x,Uj  e  Rd,  j  =  1, . . . ,  m, 

3= 1 

where  Vj,  j  =  1  are  univariate  functions.  The  following  greedy  type  al¬ 

gorithm  (projection  pursuit)  was  proposed  in  [FS]  to  solve  this  problem.  Assume 
functions  r i, . . . ,  rm  i  and  vectors  ui i, ,  wm_ i  have  been  determined  after  m  —  1 
steps  of  algorithm.  Choose  at  m-th  step  a  unit  vector  cjm  and  a  function  rm  to 
minimize  the  error 

m 

II f(X)-J2rj(U3  -X)\\l2- 

3= 1 

This  is  one  more  example  of  Pure  Greedy  Algorithm.  The  Pure  Greedy  Algorithm 
and  some  other  versions  of  greedy  type  algorithms  have  been  intensively  studied 
recently  (see  [B],  [DDGS],  [DMA],  [Du],  [DT2],  [DT3],  [H],  [Jl],  [J2],  [T14  24]). 
In  this  section  we  discuss  PGA  and  some  its  modifications  which  make  them  more 
ready  for  implementation.  We  call  this  new  type  of  greedy  algorithms  Weak  Greedy 
Algorithms  (see  Introduction  for  definitions  of  PGA  and  WGA). 

If  Hq  is  a  finite  dimensional  subspace  of  H ,  we  let  Ph0  be  the  orthogonal  projector 
from  H  onto  Hq.  That  is  Pn0{f)  is  the  best  approximation  to  /  from  Hq.  We  let 
g(f)  e  V  be  an  element  from  V  which  maximizes  |(/, g)\.  We  shall  assume  for 
simplicity  that  such  a  maximizer  exists;  if  not  suitable  modifications  are  necessary 
(see  Weak  Orthogonal  Greedy  Algorithm  below)  in  the  algorithm  that  follows. 
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Orthogonal  Greedy  Algorithm  (OGA).  We  define  Rq(/)  :=  i?g(/, V)  :=  / 
and  G q(/)  :=  Gq(/,V)  :=  0.  Then  for  each  m  >  1,  we  inductively  define 

Hm  :=Hm(f )  :=  span {<?(££(/)), . . . ,  g^^/))} 

G^(/)  -Grnif'V)  :=  PHm  (/) 
iC(/)  :=f~G°m(f). 

We  remark  that  for  each  /  we  have 

(8.1)  ||/-G^(/,2>)||  <  IK_1(/)-G1(i^_1(/),2?)||. 

Let  a  sequence  r  =  0  <  tfc  <  1,  be  given.  Following  [T20]  we  define  Weak 

Orthogonal  Greedy  Algorithm. 

Weak  Orthogonal  Greedy  Algorithm  (WOGA).  We  define  /q’t  :=  /.  Then 
for  each  m  >  1  we  inductively  define: 

1 ) .  ip'ff  is  any  satisfying 

I  (fm  —  1 )  T °m  )  I  >*m  SUp  |(/^L1,^)|; 

g£V 

2) . 

G°r£(f,  v)  ■=  Ph -  (/),  Where  HTm  :=  Span(^°’T, . . . , 

3). 

f°niT  :=f-GX{f,V). 

It  is  clear  that  G^  and  G^r  in  the  case  £&  =  1,  h  =  1,  2, . . . ,  coincide  with  PGA 
Gm  and  OGA  G°m  respectively.  It  is  also  clear  that  WGA  and  WOGA  are  more 
ready  for  implementaion  than  PGA  and  OGA. 

8.1.  Convergence.  The  convergence  of  PGA  and  WGA  with  tk  =  t,  0  <  t  <  1, 
was  established  in  [J 1]  and  [RW].  The  first  sufficient  condition  on  r  which  includes 
sequences  with  lim  inf  ^oo  tk  =  0  was  obtained  in  [T20] . 

Theorem  8.1.  Assume 

°o 

(8.2)  = 

k= l 

Then  for  any  dictionary  V  and  any  f  G  H  we  have 

lim  ||  /  —  G1m(f,V)\\  =  0. 

m— >•  oo 

In  [T20]  we  reduced  the  proof  of  convergence  of  WGA  with  weakness  sequence  r 
to  some  properties  of  ^-sequences  with  regard  to  r.  Theorem  8.1  was  derived  from 
the  following  two  statements  proved  in  [T20] . 

Proposition  8.1.  Let  r  be  such  that  for  any  {aj}jT1  G  I2,  aj  >  0,  j  =  1,  2, . . .  we 
have 

n 

lim  inf  an  )  aj  / tn  =  0. 

n—y  00  *  ^ 

3= 1 

Then  for  any  H ,  V,  and  f  G  H  we  have 


lim  ||/5 


T  I 

m  I 


0. 
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Proposition  8.2.  If  r  satisfies  the  condition  (8.2)  then  r  satisfies  the  assumption 
of  Proposition  8.1. 

The  following  simple  necessary  condition 

OO 

=  °° 

k=  1 

was  mentioned  in  [T20] .  The  first  nontrivial  necessary  conditions  were  obtained  in 
[LTe].  We  proved  in  [LTe]  the  following  theorem. 

Theorem  8.2.  In  the  class  of  monotone  sequences  r  =  {tkfkLi,  1  >  ti  >  t2  > 
•••  >  0,  the  condition  (8.2)  is  necessary  and  sufficient  for  convergence  of  Weak 
Greedy  Algorithm  for  each  f  and  all  Hilbert  spaces  H  and  dictionaries  V. 

The  proof  of  this  theorem  is  based  on  a  special  procedure  which  we  called  Equal¬ 
izer.  In  [LTe]  we  gave  an  example  of  a  class  of  sequences  r  for  which  the  condition 
(8.2)  is  not  a  necessary  condition  for  convergence.  We  also  proved  in  [LTe]  a  theorem 
which  covers  Theorem  8.1. 

Theorem  8.3.  Assume 


OO 


2S+1-1 


Then  for  any  dictionary  V  and  any  f  e  H  we  have 


lim  ||/  —  G1m(f,V)\\  =  0. 

ra— >-oo 

We  proved  in  [T23]  a  criterion  on  r  for  convergence  of  WGA.  Let  us  introduce 
some  notation. 

We  define  by  V  the  class  of  sequences  x  =  [xk}kLi,  Xk  >  0,  k  =  1,  2, . . . ,  with 
the  following  property:  there  exists  a  sequence  0  =  go  <  Qi  <  ■  •  •  such  that 


(8.3) 


<  oo; 


and 

(8.4) 


oo  qa 

X>-s5>2fc<o°, 


s=l  k=  1 


where  A qs  :=  qs  -  qs  \. 

Theorem  8.4.  The  condition  r  ^  V  is  necessary  and  sufficient  for  convergence  of 
Weak  Greedy  Algorithm  with  weakness  sequence  r  for  each  f  and  all  Hilbert  spaces 
H  and  dictionaries  V. 

The  proof  of  the  sufficient  part  of  Theorem  8.4  is  a  refinement  of  the  original  proof 
of  Jones  [Jl].  The  study  of  the  behavior  of  sequences  an  Xq=i  aj  f°r  e  ^2, 

%  >  0,  j  =  1,  2, . . . ,  plays  an  important  role  in  the  proof.  It  turns  out  that  the  class 
V  appears  naturally  in  the  study  of  the  above  mentioned  sequences.  We  proved  in 
[T23]  the  following  theorem. 
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Theorem  8.5.  The  following  two  conditions  are  equivalent 

(C.l)  r  i  V, 

n 

(C.2)  \/{aj}°T1  E  1-2,  aj  >  0,  liminfa^V^  aj/tn  =  0. 

J  n— >-oo  *  ^ 

3  = 1 

We  give  a  result  on  convergence  of  WOGA  now. 

Theorem  8.6.  Assume 

OO 

(8.5)  = 

k= l 

Then  for  any  dictionary  V  and  any  f  E  H  we  have 

(8.6)  lim  ||/  —  G^ff (f,V)\\  =  0. 

m— >•  oo 


Remark  8.1.  It  is  easy  to  see  that  in  the  case  V  =  B  -  orthonormal  basis  the 
assumption  (8.5)  is  also  necessary  for  convergence  (8.6)  for  all  f. 

Theorems  8.4  and  8.6  show  that  conditions  on  the  weakness  sequence  for  con¬ 
vergence  of  WGA  and  WOGA  are  different. 

8.2  Rate  of  convergence.  For  a  general  dictionary  V  we  define  the  class  of 
functions 

A°{V,  M)  :=  {/  6  F  :  /  =  ^  ckwk,  wk  E  V,  #A  <  oo  and  ^  \ck\  <  M} 

k£ A  k£ A 

and  we  define  Ai{T>,  M)  as  the  closure  (in  H)  of  A° (V,  M).  Furthermore,  we  define 
Ai{V)  as  the  union  of  the  classes  Ai(V,M)  over  all  M  >  0.  For  /  E  Ai(V),  we 
define  the  norm 

l/Ui  ( v ) 

as  the  smallest  M  such  that  /  G  Ai(V,M). 

It  was  proved  in  [DT2]  that  for  a  general  dictionary  V  the  Pure  Greedy  Algorithm 
provides  the  following  estimate 

(8.7)  ||/-Gm(/,B)||<|/Ul(I)|m-1/6. 

(In  this  and  similar  estimates  we  consider  that  the  inequality  holds  for  all  possible 
choices  of  {Gm}.)  The  paper  [DT2]  contains  also  an  example  of  a  dictionary  V  and 
an  element  /  such  that  (see  Subsection  8.3  below) 

(8.8)  \\f  -  Grn(f,V)\\  >  ^\f\Al{v)m~1/2,  TO  >4. 

We  proved  in  [KoT2]  a  new  estimate 

(8-9)  ||/-Gm(/,X>)||  <  4|/Ul(I>lm-11/62 
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which  improves  a  little  the  original  one  (see  (8.7)). 

E.  Livshitz  [Li]  proved  that  there  exist  5  >  0,  a  dictionary  V  and  an  element 
f  E  H,  f  7^  0,  such  that 

(8-10)  11/  -  Gm(/,D)||  >  Ul(„) 

with  a  positive  constant  C.  We  developed  and  refined  ideas  from  [Li]  in  [T24]  and 
proved  the  following  lower  estimate.  There  exist  a  dictionary  V  and  an  element 
/  G  H,  f  A  0,  such  that 

(8.11)  ||/-Gm(/,B)||  >Cm-V»|/Ui(I)| 

with  a  positive  constant  C. 

For  the  WGA  we  have  the  following  estimate  [T20]. 

Theorem  8.7.  Let  V  be  an  arbitrary  dictionary  in  H .  Assume  r  :=  {tkfkLi  is  a 
nonincreasing  sequence.  Then  for  f  G  Ai(T>,M)  we  have 

m 

(8.12)  \\f-GTm(f,V)\\<M(l  +  ^tlrtrn/2{2+tm)- 

k=  1 

In  a  particular  case  r  =  t,  (tk  =  t,  k  =  1,2,...),  (8.12)  gives 

||/  -  GUf,V)\\  <  M{  1  +  mt2)-t/(4+2t),  0  <  t  <  1. 

This  estimate  implies  the  following  inequality 

(8.13)  ||/-GL(/,»)ll<C1(<)m-“*|/Ul(I,l,  a  <1/6, 

with  the  exponent  at  approaching  0  linearly  in  t.  We  proved  in  [T24]  that  this 
exponent  can  not  decrease  to  0  slower  than  linearly. 

Theorem  8.8.  There  exists  an  absolute  constant  b  >  0  such  that  for  any  t  >  0  we 
can  find  a  dictionary  Vt  and  a  function  ft  G  A\(fDt)  such  that 

lirii inf  \\ft  -  Gtrn{ft,Vt)\\mbt /\ft\Al(vt)  >  0. 

ra— >-oo 

We  formulate  one  result  for  WOGA  from  [T20].  In  the  case  of  OGA  this  theorem 
was  proved  in  [DT2]. 

Theorem  8.9.  Let  V  be  an  arbitrary  dictionary  in  H .  Then  for  each  f  G  A\  ( D ,  M ) 
we  have 

m 

\\f-G°'(f,V)\\  <M(1  +  Y.tl)-1/2. 

k=  1 

There  is  one  more  greedy  type  algorithm  which  works  well  for  functions  from  the 
convex  hull  Ai(D)  :=  {/  :  |/Ul(x>)  <  1}  of  V±,  where  V±  :=  {±g,  g  G  V}. 

There  are  several  modifications  of  Relaxed  Greedy  Algorithm  (see  for  instance 
[B],  [DT2]).  Before  giving  the  definition  of  Weak  Relaxed  Greedy  Algorithm  (WRGA) 
we  make  one  remark  which  helps  to  motivate  the  corresponding  definition.  Assume 
Gm  i  G  Ai(D)  is  an  approximant  to  /  G  A\(V)  obtained  at  the  (m  —  l)-th  step. 
The  major  idea  of  relaxation  in  greedy  algorithms  is  to  look  for  an  approximant 
at  the  TO-th  step  of  the  form  Gm  :=  (1  —  a)Grn- \  -f  ag ,  g  G  V±,  0  <  a  <  1.  This 
form  guarantees  that  Gm  G  Ai(T>).  We  give  now  the  definition  of  two  versions  of 
WRGA. 
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Weak  Relaxed  Greedy  Algorithms  (WRGA).  We  define  //’*  :=  f  and  Gq’*  := 
0  for  i  =  1,2.  Then  for  each  m  >  1  we  inductively  define 
1 ).  G  V±  is  any  satisfying 


(8.14) 
and 

(8.15) 


</, 


T,1 


i  ,<p£~G 


'T,l 


l)  >  tmWfi 


T.  1 


ll^m1  —  —  ll/i 


T,:L  II* 

m  l  1 1  ? 


G  I)±  is  any  satisfying 


(8.16) 


(/m’-l.Vtf-G 


r,2 


l)  >  tm  1 1  /, 


r,2 


«;■ 

^m.1  :=  Gf^/f^V)  :=  (1  —  +  Om^1, 

:=  {fit  l^m1  -  Gm1_1)||<^1  -  G^H-2; 

Gm  :=  Gftifi  V)  :=  (1  -  fim)GTt-\  +  /W^2, 

m 

/3m  '■=  tm(l  +  1  ^°r  m  -  1‘ 

fc=l 

3). 

fT,l  f  _  QT,t  ■  -I  9 

We  formulate  now  some  theorems  on  convergence  rates  of  greedy  type  algorithms 
WRGA  for  functions  from  Ai(T>,  M ). 

Theorem  8.10.  Let  V  be  an  arbitrary  dictionary  in  H .  Then  for  each  f  G  A\{V) 
we  have 


(8.17)  ||/-G^(/,®)ll<2(l  +  ^fl)-1/2,  t  =  1,2. 

k=  1 

We  present  some  results  from  [T17]  on  r-greedy  dictionaries  (see  Definition  1.3). 

Definition  8.1.  We  say  V  is  a  X-quasiorthogonal  dictionary  if  for  any  n  G  fif  and 
any  gi  G  V,  i  =  1, . . . ,  n,  there  exists  a  collection  ipj  G  V,  j  =  1, ,  M,  M  < 
N  :=  An,  with  the  properties: 


Qi  £  Xm  '■=  Span(cpi, . . . ,  Pm)', 


and  for  any  f  G  Xm  we  have 


max 
i  <j<M 
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Theorem  8.11.  Let  a  given  dictionary  V  be  X-quasiorthogonal  and  let  0  <  r  < 
(2A)_1  be  a  real  number.  Then  for  any  f  such  that 

am(f,V)  <  m~r,  TO  =  1,2,..., 

we  have 

\\f-Gm{f,V)\\  <C(r,\)m-r,  m  =  1,2,.... 

Remark  8.2.  It  is  clear  that  an  orthonormal  dictionary  is  a  1-quasiorthogonal 
dictionary. 

Remark  8.3.  Theorem  8.11  holds  if  an  assumption  that  V  is  X-quasiorthogonal 
is  replaced  by  an  assumption  that  V  is  asymptotically  X-quasiorthogonal.  In  order 
to  get  the  definition  of  asymptotically  X-quasiorthogonal  dictionary  we  replace  N  in 
the  Definition  8.1  by  N(n )  and  instead  of  N  =  An  we  require 

lim  sup  N  (n)  /n  =  A. 

n— >-oo 

Here  are  two  examples  of  asymptotically  A-quasiorthogonal  dictionaries. 

Example  8.1.  The  dictionary  X  '■=  {f  =  ■>  J  C  [0,1)}  where  xj  is 

the  characteristic  function  of  an  interval  J  is  an  asymptotically  2-quasiorthogonal 
dictionary. 

Example  8.2.  The  dictionary  T{r)  that  consists  of  functions  of  the  form  f  = 
PXj ,  1 1 /| |  =  1,  where  p  is  an  algebraic  polynomial  of  degree  r  —  1  and  xj  is  the 

characteristic  function  of  an  interval  J ,  is  asymptotically  2r- quasiortho gonal. 

Example  8.3.  For  given  /i,  7  >  1  a  dictionary  V  is  called  ( fj.,x)-seTnistable  if  for 
any  gi  G  V,  i  =  1, . . . ,  n,  there  exist  elements  hj  e  V,  j  =  1, . . . ,  M  <  /in,  such 
that 

gi  G  Span {/*!,..., /iM} 
and  for  any  c  1, ,  cm  we  have 

M  ,  M  \  1/2 

llX>AII>7-1/2(£c?)  . 

j  1  S= 1  J 

A  (/i,  7) -semistable  dictionary  V  is  /i x-quasiorthogonal . 

8.3.  Saturation  property  of  Pure  Greedy  Algorithm.  We  consider  in  this 
subsection  a  generalization  of  the  Pure  Greedy  Algorithm.  Take  a  fixed  number 
n  e  J\f  and  define  the  basic  step  of  the  n-dimensional  Greedy  Algorithm  as  follows. 
Find  an  n-term  polynomial 

n 

Pn(f)  -=pn{f,V)  =  ££&,  gt  e  V,  i  =  1, . . . ,  n, 

71=1 

such  that  (we  assume  its  existence) 

11/  ~Pn{f)\\  =  an(f,V). 

Denote 


G(n,f)  :=  G(n,f,T> )  :=pn(f),  R(n,  f)  :=  R(n,f,V)  :=  f  -pn{f). 
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n-dimensional  Greedy  Algorithm.  We  define  Ro(n,  f )  :=  f  and  Go(n,  f )  :=  0. 
Then,  for  each  m  >  1,  we  inductively  define 

Gm{n,f)  :  =  Gm(n,f,D )  :=  Gm_i(ra,/)  +  G(n,  /)) 

Rm(n,f )  :  =  Rm(n,f,V)  :=  /  -  Gm(n,  f)  =  R(n,Rm-i(n,  f)). 

It  is  clear  that  a  1-dimensional  Greedy  Algorithm  is  a  Pure  Greedy  Algorithm. 
For  a  general  dictionary  T>,  and  for  any  0  <  j3  <  1,  we  define  the  class  of  functions 

A°p{V,M)  :=  {/  G  H  :  f  =  ^ ckwk ,  wk  G  V,  |A|  <  oo  and  ^  \ckf  <  Mp}, 

k£ A  fceA 

and  we  define  Ap(T>,  M)  as  the  closure  (in  H)  of  A°p(T>,  M).  Furthermore,  we  define 
Ap{T>)  as  the  union  of  the  classes  Ap(V,M)  over  all  M  >  0.  For  /  G  Ap(T>),  we 
define  the  “quasinorm” 

\f\Ap(D) 

as  the  smallest  M  such  that  /  G  Ap(T> ,  M).  The  following  general  estimate  for  the 
error  in  approximation  of  functions  /  G  Ap(T> ),  f3  <  1,  was  proved  in  [DT2]. 

Theorem  8.12.  If  f  G  Ap{T>),  /3  <  1,  then  for  a  :=  1/ /3  —  1/2,  we  haue 
(8.19)  <Tm(/,£>)  <  m  =  l,2,.... 

where  C  depends  on  /3  if  (3  is  small. 

In  [DT2]  we  gave  an  example  which  showed  that  replacing  a  dictionary  B  given 
by  an  orthogonal  basis  by  a  nonorthogonal  redundant  dictionary  V  may  damage 
the  efficiency  of  the  Pure  Greedy  Algorithm.  The  dictionary  T>  in  our  example 
differs  from  the  dictionary  B  by  only  one  suitably  chosen  element  g. 

Let  {hk}<^=1  be  an  orthonormal  basis  in  a  Hilbert  space  H  and  let  B  =  {hk}kT1 
be  the  corresponding  dictionary.  Consider  the  following  element 

g  :=  Ah\  +  A/12  +  aA  ^^(fc(A:  +  1  ))^1^2hk 

k>  3 


with 

A  :=  (33/89)1/2  and  a  :=  (23/ll)1/2 
Then,  ||g||  =  1.  We  define  the  dictionary  V  =  B  U  {g}. 
Theorem  8.13.  For  the  function 


f  —  hi  +  /12 

which  is  in  each  space  Ap(T>),  0  <  j3  <  1,  we  have 
(8.20)  11/ -  Gm(f,V)\\  >  m~1/2,  m  >  4. 

We  proved  in  [T17]  that  the  n-dimensional  Greedy  Algorithm,  like  the  Pure 
Greedy  Algorithm  has  a  saturation  property. 
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Theorem  8.14.  For  a  given  n  and  any  orthonormal  basis  {hk}^Li  there  exists  an 
element  g  such  that  for  the  dictionary  D  =  g  U  {hk}^=1  there  is  an  element  f  which 
has  the  property:  for  any  0  <  [3  <  1 

||/ -  Gm{n,f)\\/\f\Ap{p)  >  C((3)n~1/,3{m  +  2)~1/2. 


Open  Problems. 

8.1.  Find  the  order  of  decay  of  the  sequence 

7  (m):=  sup  (||/ - Gfm(/,^)|||/|^(c)), 

f,v,{Gm} 

where  sup  is  taken  over  all  dictionaries  V,  all  elements  /  G  A\(T>)  \  {0}  and  all 
possible  choices  of  {Gm}. 

8.2.  Is  there  greedy  type  algorithm  realizing  (8.19)  for  0  <  (3  <  1? 


9.  Greedy  Algorithms  in  Banach  spaces 

In  this  section  we  present  some  results  on  greedy  approximation  with  regard  to 
redundant  dictionaries  in  Banach  spaces.  These  results  are  fragmentary  and  should 
be  considered  as  an  attempt  to  understand  a  role  of  redundancy  and  nonlinearity  in 
the  general  setting  for  Banach  spaces.  There  is  no  general  results  on  convergence  of 
X-Greedy  Algorithm  and  Dual  Greedy  Algorithm.  Some  results  about  performence 
of  DGA  can  be  found  in  [Du].  It  is  proved  in  [Du]  that  the  assumption  that  X 
is  a  smooth  Banach  space  is  a  necessary  and  sufficient  condition  for  the  sequence 
{\\R°(f,v)\\x}  to  be  strictly  decreasing  for  each  /  G  X  and  all  dictionaries  V. 

9.1.  Uniformly  smooth  Banach  spaces.  Recently,  we  proved  in  [T22]  one 
general  convergence  result  for  the  generalization  of  WOGA  to  Banach  spaces.  We 
call  this  generalization  Weak  Chebyshev  Greedy  Algorithm  (WCGA).  We  will  use 
the  notation  V±  :=  {±g,  g  G  V}  here.  Let  a  weakness  sequence  r  =  {tk}) ^=1, 
0  <  tk  <  1,  be  given. 

Weak  Chebyshev  Greedy  Algorithm  (WCGA).  We  define  ffi  :=  /q’t  :=  /. 
Then  for  each  m  >  1  we  inductively  define 
!)■  Tcm  '■=  F™  e  ^)±  is  any  satisfying 

sup >  Fr  (g). 

2) .  Define 

■=  :=  Span{(^}™=1, 

and  define  G ^  :=  Gfff  to  be  the  best  approximant  to  f  from  <Fm. 

3) .  Denote 

/C  . _  4>c?'7"  . _  /  /DC 

m  *  Jm  *  J  ^ m' 

Let  us  give  one  more  definition  of  weak  greedy  type  algorithm.  We  will  not 
present  results  on  it  here. 
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Weak  Dual  Greedy  Algorithm  (WDGA).  We  define  :=  f^,T  :=  /.  Then 
for  each  m  >  1  we  inductively  define 
!)■  Tm  '■=  e  v±  is  anV  satisfying 

FfD  (p£)  >tm  sup  Ffn  \g). 

m~1  g£T>±  m~ 


2).  Define  am  as 


11/ 


D 

m- 


—  a 


<Pm\\  =  ll/m-1  -  amT 


m'f'm 


aGH 


D 

m 


oD\ 
lm^m  I 


3).  Denote 

_pD  . jt D ,t  . nD  D 

J  m  *  J  m  *  J  m  l  '^m^Prn' 

We  define  now  the  generalization  for  Banach  spaces  of  the  Weak  Relaxed  Greedy 
Algorithm  studied  in  [T20]  in  the  case  of  Hilbert  space. 

Weak  Relaxed  Greedy  Algorithm  (WRGA).  We  define  ff  :=  /q’t  :=  f  and 
G o  :=  Gq  t  :=  0.  Then  for  each  m  >  1  we  inductively  define 
!)■  Trm  :=  Trn  e  v±  is  anV  satisfying 

~  Grm_, l)  >  tm  sup  Frm  i(g  -  G^). 


2).  Find  0  <  Xm  <  1  such  that 


||/  -  ((1  -  Xm)Grm_1  +  \mvrm)\\  =  inf  ||/  -  ((1  -  A )Grm_1  +  \<prm)\\ 

(J  \  A  \  1 


and  define 

Grm  :=  G%  :=  (1  -  A m)Grm_1  +  Ara^. 

3).  Denote 

/r  . _  i?r,T  . _  s^ir 

m  *  J  m  *  J  ^  m  * 


Remark  9.1.  It  follows  from  the  definition  of  WCGA,  WDGA,  and  WRGA  that 
the  sequences  {||//J|},  {||/ml|},  am 1  {ll/mll}  are  nonincreasing  sequences. 

The  term  “weak”  in  these  definitions  means  that  at  the  step  1).  we  do  not  shoot 
for  the  optimal  element  of  the  dictionary  which  realizes  the  corresponding  sup  but 
are  satisfied  with  weaker  property  than  being  optimal.  The  obvious  reason  for  this 
is  that  we  don’t  know  in  general  that  the  optimal  one  exists.  Another,  practical 
reason  is  that  the  weaker  the  assumption  the  easier  to  satisfy  it  and  therefore  easier 
to  realize  in  practice.  The  Weak  Relaxed  Greedy  Algorithm  provides  incremental 
approximants  discussed  in  [DDGS].  In  [DDGS]  they  also  impose  weaker  assumptions 
(e-greedy)  on  an  element  of  the  dictionary  than  being  optimal.  For  instance,  for  a 
given  sequence  en  >  0,  n  =  1,  2, . . . ,  they  take  0  <  am  <  1  and  gm  £  D 

satisfying 

11/  ((1  ®-m)Gm—  1  ~f~  ®mfilm)||  hlf  || /  ((1  Oi)G m—l  ~t~  ®fiOH  T 

0<ot<l,g£V 

instead  of  trying  to  find  optimal  ones.  Their  approach  is  different  from  ours. 
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We  discuss  in  this  section  the  questions  of  convergence  and  the  rate  of  conver¬ 
gence  for  the  two  above  defined  methods  of  approximation:  WCGA  and  WRGA. 
It  is  clear  that  in  the  case  of  WRGA  the  assumption  that  /  belongs  to  the  closure 
of  convex  hull  of  V±  is  natural.  We  denote  the  closure  of  convex  hull  of  V±  by 
A(V)  :=  A\(T>).  It  has  been  proven  in  [T20]  (see  Theorems  8.9  and  8.10  from 
Section  8)  that  in  the  case  of  Hilbert  space  both  algorithms  WCGA  and  WRGA 
give  the  approximation  error  for  the  class  A(V)  of  the  order 


d  +  £<ir1/2. 

k= l 


We  consider  here  approximation  in  uniformly  smooth  Banach  spaces.  For  a  Banach 
space  X  we  define  the  modulus  of  smoothness 

p{u)  :=  sup  {]-(\\x  +  uy\\  +  \\x-uy\\)-l). 

IMMHM  2 

The  uniformly  smooth  Banach  space  is  the  one  with  the  property 

lim  p(u)/u  =  0. 

M— ^  0 


It  is  easy  to  see  that  for  any  Banach  space  X  its  modulus  of  smoothness  p(u )  is  an 
even  convex  function  satisfying  the  inequalities 

(9.1)  max(0,  u  —  1)  <  p(u)  <  u,  u  E  (0,  oo). 

It  has  been  established  in  [DDGS]  that  the  approximation  error  of  an  algorithm 
analogous  to  our  WRGA  with  tk  =  1,  k  =  1,  2, . . . ,  for  the  class  A(V)  can  be 
expressed  in  terms  of  modulus  of  smoothness  of  Banach  space.  Namely,  if  modulus 
of  smoothness  p  of  X  satisfies  the  inequality  p(u)  <  q fuq ,  q  >  1,  then  the  error  is  of 
£)(m1/<2-1)  ]q  been  proven  in  [T22]  that  both  algorithms  WCGA  and  WRGA 
provide  approximation  for  the  class  A(V)  in  a  Banach  space  X  with  modulus  of 
smoothness  p(u)  <  yu9,  1  <  q  <  2,  of  order 


(9.2)  (l  +  E^)-17".  p:=  At- 

k=  1  ^ 

We  also  proved  (see  a  version  of  [T22]  submitted  for  publication  in  Advances  of 
Comp.  Math.)  that  WCGA  converges  for  any  /  G  X  and  WRGA  converges  for 
any  /  G  A(V)  if  r  satisfies  the  condition 


OO 

(9.3)  ^  tmim{p ,  T,  6)  =  OO. 

m= 1 


The  sequences  {£m(p,  t,  6)}  are  defined  as  follows. 
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Definition  9.1.  Let  p(u )  be  an  even  convex  function  on  (—00,00)  with  the  prop¬ 
erty:  p( 2)  >  1  and 

lim  p(u)/u  =  0. 

u—¥  0 

For  any  r  =  {tk}kLi,  0  <  tk  <  1,  and  0  <  6  <  1/2  we  define  :=  £m(p,  r,  6)  as  a 
number  u  satisfying  the  equation 


(9.4) 


p(u)  =  6tmU. 


In  a  particular  case  of  p(u)  x  uq,  1  <  q  <  2,  the  relation  (9.3)  is  equivalent  to 

m 

(9.5)  =  o°»  P:=WT[- 

fc=l  y 


We  gave  in  [T22]  an  example  which  shows  that  (9.5)  is  a  necessary  condition  for 
convergence  of  WCGA  in  Banach  spaces  with  modulus  of  smoothness  of  power  type 
q  for  all  V  and  f  E  X . 

It  is  well  known  (see  for  instance  [DDGS],  Lemma  B.l)  that  in  the  case  X  =  Lp, 
1  <  p  <  00  we  have 


(9.6)  p(u) 

It  is  also  known  (see  [LT], 


f  up/p  if  1  <  p  <  2, 

—  \  (p  —  l)u2 / 2  if  2  <  p  <  00. 

p.63)  that  for  any  X  with  dimX  = 
p(u)  >  (1  +  n2)1/2  -  1 


00  one  has 


and  for  every  X,  dim  X  >  2, 


p(u)  >  Cu 2,  C  >  0. 

This  limits  power  type  modulus  of  smoothness  of  nontrivial  Banach  spaces  to  the 
case  1  <  q  <  2.  The  following  theorem  gives  the  rate  of  convergence  of  WCGA  for 
/  in  A{V). 

Theorem  9.1.  Let  X  be  a  uniformly  smooth  Banach  space  with  the  modulus  of 
smoothness  p(u)  <  ^uq ,  1  <  q  <  2.  Then  for  a  sequence  r  :=  {tk}kLi,  tk  <  1, 
k  =  1,2,...,  we  have  for  any  f  E  A(V)  that 

m 

11/  -  G%U,V) II  <  C(q,  7)(1  +  Y1tlr1,P.  P  ■=  -W, 

k= 1  ^ 

with  a  constant  C(q,ry)  which  may  depend  only  on  q  and  7. 

9.2.  Finite-dimensional  spaces.  We  discuss  some  results  from  [DT3]  on  X- 
Greedy  Algorithms  in  a  particular  case  of  finite-dimensional  space  X  =  Mn ,  equipped 
with  one  of  standard  norms  £p.  The  reasons  of  our  concentration  on  the  finite  di¬ 
mensional  problems  are  the  following.  It  is  well-known  how  one  can  apply  the  finite 
dimensional  results  in  studying  the  smoothness  classes.  Next,  we  are  interested  in 
understanding  an  interplay  of  several  parameters  including  a  parameter  measuring 
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the  redundancy  of  a  system  V.  In  this  subsection  it  will  be  more  convenient  for  us 
to  use  systems  D  that  are  not  necessarily  normalized.  We  note  that  the  definition 
of  W-Greedy  Algorithm  does  not  depend  on  normalization  of  a  system. 

We  use  the  standard  notation  Mn  for  the  n-dimensional  space  of  real  vectors  and 
the  fp-norm  is  defined  as  follows 


n^iip :=  (^2  i  < p  <  oo, 

3= 1 

1 1 x 1 1 oo  :=  max  \xj\. 

3 

Let  Bp  denote  the  unit  fp-ball  of  MT . 

We  give  first  two  theorems  from  [DT3]  about  the  to- term  approximation  in  Mn. 
In  this  subsection,  we  shall  consider  m-term  approximation  in  the  i \v  norm  of  certain 
sets  F  C  Rn.  In  Theorem  9.2,  we  use  ideas  from  [KT1]  to  give  a  lower  estimate 
for  m-term  approximation  in  the  t\  norm  from  a  general  dictionary  to  general  sets 
F  cRn.  Lower  estimates  in  the  t\  norm  automatically  provide  lower  estimates  in 
the  other  tq  norms,  q  >  1  (see  Corollary  9.1). 

We  let  Voln(<S')  denote  the  Euclidean  n-dimensional  volume  of  the  set  S  Cl". 
We  recall  that  the  volume  of  the  unit  ball  Bp,  1  <  p  <  oo,  in  Mn  can  be  estimated 
by 

(9.7)  C?n-n/p  <  Vo  1„(5”)  <  Ctfn-n!p, 

with  Ci,  C2  >0  absolute  constants. 

Theorem  9.2.  If  F  C  B!f  satisfies 

Voln  F  >  Kn  Voln  B!2\ 

for  some  0  <  K  <  1,  then  for  any  dictionary  V,  ffV  =  N ,  we  have 
am{F,V)  1  >  CK2n1/2N m  <  nf 2, 
with  C  >  0  an  absolute  constant. 

Corollary  9.1.  Let  F  and  V  be  as  in  Theorem  9.2.  For  any  1  <  q  <  00,  we  have 
am(F,V)q  >  CK2n1/q-1/2N-^ ,  to  <  n/2. 
with  C  an  absolute  constant. 

Corollary  9.2.  Let  V  be  as  in  Theorem  9.2.  For  any  1  <p,q  <  00,  we  have 

(9.8)  am(Bp  ,V)q  >  Cn1/q-1/pN-^,  m  <  n/2. 
with  C  an  absolute  constant. 

Remark  9.2.  In  the  case  N  =  an  and  p  =  q,  the  lower  bound  in  Corollary  9.2  can 
be  replaced  by  Ca~2rn 

We  shall  next  consider  upper  estimates  for  crm(F,V)p.  We  begin  with  the  fol¬ 
lowing  simple  theorem. 
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Theorem  9.3.  Let  X  be  any  n-dimensional  Banach  space  and  let  B  be  its  unit 
ball.  For  any  N  there  exists  a  system  V  C  X ,  #2?  =  N,  such  that 

(9-9)  arn{B1V)x  <  min(l,e^),  eN  :=  —  ■ 

We  consider  now  the  £p-Greedy  Algorithms,  1  <  p  <  oo  (see  Inrtroduction  for 
the  definition).  In  the  case  p  =  2,  the  £P-Greedy  Algorithm  coincides  with  the 
Pure  Greedy  Algorithm.  Then,  Gfn(x)  :=  Gm(x,T> )  is  an  m- term  approximation 
to  x  from  D  which  we  call  the  m-th  greedy  approximant.  We  note  that  the  best 
approximation  to  x  G  Rn  from  D  is  not  necessarily  unique  and  therefore  GT^  {x)  is 
not  necessarily  unique.  We  define 

7 m(x,T>)q  :=  sup  \  \x  —  G^n(x,  T>)\\q 

where  the  supremum  is  taken  over  all  possible  resulting  G1^n{x,V).  Similarly,  we 
define 

:=  mf  \\x  -  Gpm{x,V)\\q 

where  the  infimum  is  taken  over  all  possible  resulting  G1^n{x,V).  Thus,  7  measures 
the  worst  possible  error  over  all  possible  choices  of  best  approximations  in  the 
greedy  algorithm  and  7  represents  the  best  possible  error. 

More  generally,  for  a  class  F  CK"  we  define 

7 m(^£%  :=  sup  7 pm{f,V)q 
feF 

with  a  similar  definition  for  7^  (F,  V)q.  In  upper  estimates  for  greedy  approximation 
we  would  like  to  use  7  and  for  lower  estimates  7. 

Theorem  9.3  shows  that  for  p  =  q  and  for  each  a  >  1  there  exists  a  dictionary 
V ,  #£>  =  bn,  b  =  2a  +  1,  such  that 

7  pm(B;,V)p<a-m. 

However,  the  dictionary  V  in  that  theorem  is  not  very  natural  or  easy  to  describe. 
This  estimate  and  Remark  9.2  to  Corollary  9.2  indicate  that  systems  V  with  #T> 
of  order  Cn  play  an  important  role  in  to- term  approximation  in  Mn.  We  proceed 
now  to  study  a  natural  family  of  such  systems.  We  present  results  from  [DT3]. 

Let  M  >  3  be  an  integer  and  consider  the  partition  of  [—1, 1]  into  M  disjoint 
intervals  of  equal  length:  //  =  2/M,  i  =  1, . . . ,  M .  We  let  denote  the  midpoint 
of  the  interval  h,  i  =  1, . . . ,  M,  and  H  :=  We  introduce  the  system 

Vm  :=  {x  G  Mn  :  Xj  G  H,  j  =  1, . . .  n}. 

Clearly  |Vm|  =  Mn .  We  shall  study  in  this  section  the  foo-Greedy  Algorithm  for 
the  systems  Vm- 

Theorem  9.4.  For  any  1  <  q  <  00  we  have 

(9.10)  7m  (B^o,  Vm)<!  <  n1^qM^rn,  to  =  1,2,.... 

We  shall  give  results  about  the  t\  greedy  algorithm  for  the  system  V3.  We 
consider  this  system  in  detail  for  the  following  reasons.  It  is  a  simple  system  which 
is  easy  to  describe  geometrically.  Also,  it  is  fairly  easy  to  analyze  the  approximation 
properties  of  this  system.  Moreoever,  it  turns  out  that  this  system  gives  geometric 
order  of  approximation  (see  for  example  Theorem  9.5  and  Theorem  9.7)  which  we 
know  is  the  best  we  can  expect  for  general  dictionairies  (see  Corollary  9.2). 
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Theorem  9.5.  We  have  the  estimate 

(9.11)  V3)i  <  V3)i  <  (1  -  Wy)”* 

where  k  :=  [log2(n  +  1)]  . 

The  following  lower  estimate  shows  that  (9.11)  can  not  be  improved  by  replacing 
log2(n  -f  1)  by  slower  growing  function. 

Theorem  9.6.  Let  n  =  4fc  —  1,  with  k  a  positive  integer.  For  any  m  <  3k/8,  we 
have 

A(B?,V3)i  >1/2. 

We  want  to  carry  out  an  analysis  similar  to  the  above  for  the  f^-Greedy  Algo¬ 
rithm  (Pure  Greedy  Algorithm)  and  the  dictionary  V3. 

Theorem  9.7.  Let  k  :=  [log2n].  Then, 

(9.12)  m  =  1,2, - 

The  following  theorem  shows  that  in  a  certain  sense  the  estimates  of  Theorem 
9.7  cannot  be  improved. 

Theorem  9.8.  Let  n  =  2k  for  some  positive  integer  k.  For  any  m  <  k/2,  we  have 

i(fl?,V3)2  >  1/2. 


Theorem  9.3  gives  the  upper  estimate  for  am(S2,X>)2.  In  the  particular  case 
ffV  =  Cn,  C  >  3,  this  theorem  guaranties  the  existence  of  V  such  that 

It  is  interesting  to  compare  this  estimate  with  the  following  lower  estimate  in  the 
problem  of  selection  of  optimal  basis  (see  [KT1]).  For  given  K  there  exists  a  positive 
C(K )  such  that  for  any  set  of  S  <  Kn  bases  j  =  1 , ,S  in  1"  we  have  for 
each  to  <  nj  2 

sup  inf  am(/,  £P)2  >  C(K). 
f€B”  0 

Open  problems. 

9.1.  Characterize  Banach  spaces  X  such  that  W-Greedy  Algorithm  converges 
for  all  dictionaries  V  and  each  element  /. 

9.2.  Characterize  Banach  spaces  X  such  that  Dual  Greedy  Algorithm  converges 
for  all  dictionaries  D  and  each  element  /. 

9.3.  Find  necessary  and  sufficient  conditions  on  a  weakness  sequence  r  to  guar¬ 
anty  convergence  of  Weak  Dual  Greedy  Algorithm  in  uniformly  smooth  Banach 
spaces  X  with  modulus  of  smoothness  of  fixed  power  type  q,  1  <  q  <  2,  ( p{u )  <  yu9) 
for  all  dictionaries  V  and  each  element  /  e  X. 

9.4.  Find  the  correct  (in  both  parameters  n  and  to)  order  of  decay  of  quantities 


Tm(Bp  ,Vs)p,  7 PjB-,Vs)p, 


P  =  1,2. 
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10.  Nonlinear  to-term  approximation  and  c-entropy 

In  this  section,  we  want  to  bring  out  the  connection  between  approximation 
from  a  dictionary  and  e-entropy.  We  begin  with  covering  numbers  Ne(F,£p )  for  a 
set  F  C  Rn  and  recall  their  definition.  For  each  e  >  0, 

N 

Ne{F ,  lp)  :=  min {N  :  F  C  (J  B^,e)} 

3  =  1 

with  the  minimum  taken  over  all  sets  {yj}jLi  of  points  from  Mn.  Here  Bp (yJ ,  e) 
denotes  the  fp-ball  of  radius  e  with  center  y.  By  considering  systems  V  consisting 
of  the  points  we  find 

(10.1)  inf  cr1(F,V)i  <e. 

#V=Ne(F,ip) 

In  other  words,  the  covering  numbers  immediately  give  estimates  for  1-term  approx¬ 
imation.  We  can  extend  the  above  observation  to  to- term  approximation  by  using 
the  concept  of  metric  entropy.  Let  X  be  a  linear  metric  space  and  for  a  set  V  C  X, 
let  Cm(V)  denote  the  collection  of  all  linear  spaces  spanned  by  to  elements  of  V. 
For  a  linear  space  L  C  X,  the  e-neighborhood  Ue(L )  of  L  is  the  set  of  all  x  e  X 
which  are  at  a  distance  not  exceeding  e  from  L  (i.e.  those  x  G  X  which  can  be 
approximated  to  an  error  not  exceeding  e  by  the  elements  of  L ).  For  any  compact 
set  F  C  X  and  any  integers  IV,  to  >  1,  we  define  the  (IV,  to) -entropy  numbers 

eN,m(F,X)  :=  inf  inf {e  :  F  C  U Lscm(v)Ue(L)}. 


We  can  express  crm(F,V)  as 

am{F,V)  =  inf{e  :  F  C  ULeCrri(v)Ue(L)}. 
It  follows  therefore  that 


inf  am (F,  D)  =  eNtTn(F,  X ) . 

#V=N  ’ 

In  other  words  finding  best  dictionaries  for  to  term  approximation  of  F  is  the  same 
as  finding  sets  D  which  attain  the  (IV,  m)-entropy  numbers  eN,m(F,  X).  It  is  easy 
to  see  that  em;m(F, X)  =  dm(F,X).  This  establishes  connection  between  (IV, m)- 
entropy  numbers  and  the  Kolmogorov  widths. 

The  present  section  contains  an  attempt  to  generalize  the  concept  of  classical 
Kolmogorov’s  width  in  order  to  be  used  in  estimating  best  to- term  approximation. 
For  this  purpose  we  introduce  a  nonlinear  Kolmogorov’s  (IV,  m)-width: 


dm(F,X,N) 


inf  sup  inf  inf  II/  —  g\ I 

An,#An<N  f£FL£AN  g£L 


X , 


where  A at  is  a  set  of  at  most  N  m-dimensional  subspaces  L.  It  is  clear  that 


dm(F,X,l)=dm(F,X) 


and 
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)  <  eN:m{F,X)  <  am(F.  V) 


for  any  V  with  =  N.  The  new  feature  of  dm(F,  X,  N)  is  that  we  allow  to 
choose  a  subspace  L  e  An  depending  on  /  G  F.  It  is  clear  that  the  bigger  N  the 
more  flexibility  we  have  to  approximate  /.  It  turns  out  that  from  the  point  of  view 
of  our  applications  the  following  two  cases 


(I) 


N  x  Km, 


where  K  >  1  is  a  constant,  and 

(II)  N  x  mam, 

where  a  >  0  is  a  fixed  number,  play  an  important  role. 

We  intend  to  use  the  (N,m)~ widths  for  estimating  from  below  the  best  to- term 
approximations.  There  are  several  general  results  (see  [L],  [C])  which  give  lower 
estimates  of  the  Kolmogorov  widths  dn(F,X )  in  terms  of  the  entropy  numbers 
ek(F,X).  In  [T16]  we  generalized  the  following  Carl’s  (see  [C])  inequality:  for  any 
r  >  0  we  have 


(10.2) 


max  krek(F,X)<C(r )  max  mrdrn-i{F1X). 

l</c<n  1  <m<n 


We  denote  here  for  integer  k 

ek(F,X)  :=  inf{e  :  3fu  . . . ,  /2*  G  X  :  F  C  U f=1(f3  +  eB(X))}, 

where  B(X)  is  the  unit  ball  of  Banach  space  X .  For  noninteger  k  we  set  ek{F ,  X)  := 
€[k](F,  X)  where  [k\  is  the  integral  part  of  number  k.  It  is  clear  that 

di{F,X,  2n)<en(F,X). 


In  [T16] 
(10.3) 


we  proved  the  inequality 


max  krek(F,X)  <  C(r,K) 

1  <k<n 


max 

1  <m<n 


rnrdm^(F,X,Km), 


where  we  denote 

do(F,X,N)  :=  supll/Hx. 

fSF 

This  inequality  is  a  generalization  of  inequality  (10.2).  In  [T16]  we  also  proved  the 
following  inequality 

(10.4)  max  kre^a+r)k\ogk(F,  X)  <  C  max  mrdm_i(T,  X,  mam) 

and  gave  an  example  showing  that  k  log  k  in  this  inequality  can  not  be  replaced  by 
slower  growing  function  on  k. 

In  [T16]  we  applied  inequalities  (10.3)  and  (10.4)  for  estimating  the  best  m- term 
trigonometric  approximation  from  below.  As  a  corollary  to  the  following  version  of 
(10.3)  (see  Theorem  10.1  below)  we  gave  a  new  proof  (see  [DTI])  for  the  estimate 

*m(W^,T)i»m-p, 

where  is  a  standard  Sobolev  class  (see  Section  2)  with  the  restriction  imposed 
in  the  Loo-norm. 
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Theorem  10.1.  For  any  positive  constant  K  we  have 

max  krek{F,X)<C(r,K )  max  mr X,  (Kn/m)rn). 

1<  /c  <n  1  <m<n 

We  used  in  [T16]  a  version  of  (10.4)  to  get  some  new  lower  estimates  of  m- 
term  trigonometric  approximation  in  the  Li-norm  of  multivariate  classes  MW 4  of 
functions  with  bounded  mixed  derivative  .  We  proved  in  [T16]  that 

(10.5)  am(MWr^T) i  »  m~r  (\ogm)r^d~2^ . 

The  inequality  (10.5)  gives  a  new  estimate  for  small  r. 

The  above  method  can  be  applied  to  a  general  system  4/  instead  of  trigonometric 
system  T- 

Assume  a  system  4/  :=  {4}}yLi  of  elements  in  X  satisfies  the  condition: 

(VP)  There  exist  three  positive  constants  Aj,  i  =  1,  2,  3,  and  a  sequence 
nk+i  <  A1 rtk,  k  =  1,2,...  such  that  there  is  a  sequence  of  the  de  la  Vallee-Poussin 
type  operators  14  with  the  properties 

(10.6)  I4(4j)  =  Afcj'i/h, 

A k,j  =  1  for  j  =  1, . . . ,  nfc;  Afc;i  =  0  for  j  >  A2nk , 

(10.7)  ||I4||x->x  <  A3,  A:  =  1,2,... 

Theorem  10.2.  Assume  that  for  some  a  >  0  and  b  e  M  we  have 

em(F,X)  >  C1m^a(\ogm)b,  m  =  1,2,...  . 

Then  if  a  system  T  satisfies  the  condition  (VP)  and  also  satisfies  the  following 
condition 

n 

En(F,  T)  :=  sup  inf  \\f  -  y^cjipj\\x  <  C2Wa(\ogn)b,  n=  1,2,...; 

J  J=1 


then  we  have 

arn{F^)x  »  m^a(\ogm)b . 

Open  problem  10.1.  The  correct  order  of  the  quantity  cr.m(MWf0,T)i  is  un¬ 
known. 


11.  Bilinear  Approximation 

In  this  section  we  discuss  one  particular  case  of  a  dictionary.  Denote  by  II  the 
system  of  functions  of  the  form  u(x i)v(x2).  It  is  clear  that  T2  C  II.  It  is  also  clear 
that  II  is  a  very  redundant  system.  We  already  mentioned  some  results  for  this 
system  in  Introduction  and  in  Section  8.  All  those  results  concerned  approximation 
in  Hilbert  space  T2([0,  l]2)  and  it  was  convenient  for  us  to  normalize  elements  of 
n  in  L2  (what  made  the  system  n  a  dictionary  in  L2([ 0,  l]2)).  In  this  section  we 
consider  approximation  by  n  in  all  Lp,  1  <  p  <  oo,  spaces.  In  order  to  make  the 
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system  II  a  dictionary  in  Lp  we  need  to  normalize  it  in  Lp.  We  will  denote  the 
normalized  in  Lp  system  II  by  np.  The  most  results  of  this  section  give  estimates 
for  best  to- term  approximation.  These  results  do  not  depend  on  normalization  of  II 
and  for  convenience  in  such  a  case  we  will  use  notation  II  without  an  index  p.  In  this 
section  we  concentrate  only  on  approximation  of  bivariate  functions  from  standard 
function  classes.  We  note  that  the  bilinear  approximation  is  a  well  established 
area  now  and  many  estimates  are  proved  in  general  setting:  /  is  a  function  of  2d 
variables  x  =  (x\, . . .  ,Xd),  y  =  {yi,  ■  ■  ■  ,yd)',  n  is  replaced  by  IId  :=  {u(x)v(y)};  Lp 
is  replaced  by  LPl:P2,  where 


\\f\\pi,P2  :=  W\\f(-,y)\\pi\\p2- 

The  key  role  in  bilinear  approximation  is  played  by  the  Schmidt  formula  (see  Section 

8) 

OO 

(n.i)  <U/,n)2  =  (  V  s„(J/)2)1/2. 

n=m-\- 1 


This  formula  implies  in  particular  for  a  >  0 

<7m(/,n)2  <  m~a  sn(Jf)  <mTa~1/2. 

The  following  classes  are  well  known  and  important  in  studying  integral  operators. 
We  say  that  Jf  belongs  to  the  Schatten  u-class  Sv  if 

J2$n{Jf)V  <  OO. 


The  Schmidt  formula  (11.1)  allows  us  to  prove  the  following  result. 
Theorem  11.1.  For  any  v  <2  we  have 

Jfesv  «  J](am(/,  n )2m-1/2y  <  OO. 


This  theorem  is  an  analog  of  the  following  theorem  (see  [DT2])  for  an  orthonor¬ 
mal  basis  B  for  a  Hilbert  space  H. 

Theorem  11.2.  For  any  (3  <  2  and  any  orthonormal  basis  B  we  have 
f  e  AfslB)  4=^  ^(^.(/^m-1/2)^  <  oo. 


Theorem  11.2  is  the  generalization  of  Stechkin’s  result  [St]  that  corresponds  to 
/3  =  1  in  Theorem  11.2.  Let  us  present  some  general  results  for  approximation  in 
Banach  spaces  and  then  get  as  a  corollary  error  estimates  for  approximation  by  n 
in  Lp.  We  remind  that  A\(T>)  is  a  convex  hull  of  V±.  Similarly  to  the  definition  of 
Ap(V)  in  Subsection  8.3  we  define  Ap(T>)  in  a  Banach  space  X  with  a  dictionary 
V.  It  is  easy  to  derive  (see  an  idea  in  [DT2,  Theorem  3.3])  from  Theorem  9.1  the 
following  statement. 
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Theorem  11.3.  Let  X  be  a  uniformly  smooth  Banach  space  with  the  modulus  of 
smoothness  p(u )  <  7 uq ,  1  <  q  <  2.  Then  for  any  f  G  ApifD),  0  <  /3  <  1,  we  have 

In  a  particular  case  X  =  Lp,  1  <  p  <  00,  V  =  Iipi  Theorem  11.3  gives  the 
estimate 

(11.2)  <rm(/,n)p  <  C(P)m'"“lI/f'1/2>-1//5|/|^(np|. 

This  inequality  gives  the  error  estimate  of  best  m- term  approximation  in  terms  of 
|/|^(np)  which  is  not  well  studied  for  p  ^  2.  We  will  present  some  results  on 
estimates  for  am(f ,  n)p  in  terms  of  standard  Sobolev-Nikol’skii  classes.  The  results 
from  Section  5  (see  (5.6)  (5.8))  indicate  that  bilinear  approximation  of  f(x  —  y ) 
is  closely  connected  with  the  Kolmogorov  widths  dm(F[f],  Lp)  and  best  to- term 
approximation  of  /  with  regard  to  the  trigonometric  system.  If  /  G  FLR  then 

f(x  —  y)  G  NHqR,R}> .  We  get  from  [T9]  that 

(11.3)  am{NH(R'R\  U)p  <  m--R+(V9-max(l/p,l/2))  + 

for  1  <  q  <  p  <  00  with  R  >  R(q,p),  R(q,p )  =  2(l/q  —  1/p)  for  1  <  q  <  p  <  2 
and  R(q,p)  =  1/q  +  max(l/q,  1/2)  for  p  >  2.  Comparing  (11.3)  with  (5.7)  we  see 
that  the  upper  estimates  for  the  wider  class  NHq  ’  ;  have  the  same  order  as  for 
the  class  {f(x  —  y),  f  G  HR}.  Further  results  for  anisotropic  classes  NHqRlq,R 2') 
and  their  2d-dimensional  generalizations  can  be  found  in  [T9]. 

In  the  case  l<p<q<oowe  have 

(11.4)  am(NH^R\U)p^m-R. 

A  nontrivial  estimate  in  (11.4)  is  the  lower  estimate  for  p  =  1,  q  =  00.  This  estimate 
and  generalizations  of  (11.4)  are  obtained  in  [Til],  Let  us  present  now  results  in 
approximation  in  the  L2-norm  for  general  classes  NHf  f’2  21  (see  [T9]  and  [T12]). 
We  note  that  the  study  of  am(NHq1f,2  2  ,n)Pl)P2  is  not  complete.  One  of  open 
problems  in  this  area  is  given  in  Open  problem  11.7.  Known  results  can  be  found 
in  [T9]  and  [T12].  Denote  7p  :=  (1/q,,  —  l/2)  +  ,  i  =  1,2. 

Theorem  11.4.  Let  R\  <  R-2  and  Ri  >  rji,  R2  >  772  (1  —  ^71  / ^72 )  1  ■  Then 
am{NH^R-\U)2  x  m-W-vi/Ri),  1  <  ^ 

Theorem  11.5.  Let  Ri,R2  be  as  in  Theorem  11. f.  Then 

sup  Sm(Jf)  x  m~R^  i-’Ji/^i)-!/2^  1  <  qi,  q2  <  00. 

Theorem  11.6.  Let  R\  >  R2,  R2  >  r) 2,  R\  >  pi(l  —  r]2/ R2) ~1.  Then 
am{NH^R-\  n)2  x  m-RAi-v*/R*)+vi-V2'  x  ^ 
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Theorem  11.7.  Let  R,\,  R-2  be  as  in  Theorem  11.6.  Then 


sup 

£  r-  AT  TJ  ( -^1  ’  -^2  ) 

J  Hqi 


m-i2i(l-rj2/i22)+»Ji-»j2-l/2 


1  <  91,92  <  oo. 


We  give  now  some  historical  remarks  on  estimating  eigenvalues  and  singular 
numbers  of  integral  operators.  We  begin  with  the  following  theorem  that  is  a 
corollary  to  the  Weyl  Majorant  Theorem  (see  [GK,  p.41]). 

Theorem  11.8.  Let  A  be  a  compact  (completely  continuous)  operator  in  a  Hilbert 
space  H .  Suppose  that 

sn(A)  <C  n~r ,  r  >  0. 


Then 

|A„(H)|  <C  n~r . 

Fredholm  [F]  proved  that  if  the  kernel  f(x,y )  is  a  continuous  function  and  sat¬ 
isfies  the  condition 


sup  |  f(x,y  +  t)  -  f(x,  y)\  <  C\t\a,  0  <  a  <  1, 

x,y 

then  for  an  arbitrary  p  >  2/(2 a  +  1)  the  series 

OO 

i(J/)lP  <  00 

3= J 


converges. 

Starting  with  that  article,  smoothness  conditions  with  respect  to  one  variable 
were  imposed  on  the  kernel.  Weyl  [We]  proved  the  estimate 

A  n(J,)=o(n-r-1'2) 

under  the  condition  that  the  kernel  f(x,y )  is  symmetric  and  continuous  and  that 
drf/dxr  is  continuous.  Let  us  introduce  some  more  notation.  Define  NHq^qJ 
as  follows:  f(x,y )  belongs  to  this  class  if  for  all  y  G  T  the  function  f(-,y )  of  x 
belongs  to  the  class  H^B(y),  and  B(y)  is  such  that  ||5(y)||g2  <  1.  We  use  here 
the  following  notation.  For  a  function  class  F  and  a  number  B  >  0  we  define 
FB  :=  {/  :  f/B  G  F}. 

Hille  and  Tamarkin  [HT]  achieved  significant  progress.  They  proved,  in  partic¬ 
ular,  that  for  1  <  q  <  2  and  R  >  1 

sup  \Xn(Jf)\ «  n-R-1+1/q(logn)R,  q'  =  q/(q  -  1), 

/GATJf<J?’0) 

9 ,  Qr 

and  they  conjectured  that  the  extra  logarithmic  factor  can  be  removed  or  even 
replaced  by  a  logarithmic  factor  with  a  negative  power. 

The  next  important  step  was  taken  by  Smithies  [Sm],  He  proved  the  estimate 

(11.5)  sup  sn(Jf)  <^n-R-1+1/\  1  <  q  <  2,  R>  l/q—1/2. 

f<ENH^0) 
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Of  later  results  we  mention  those  of  Gel’fond  and  M.G.  Krein  (see  [GK,  Ch.III, 
S 1 0 . 4] ) ,  Birman  and  Solomyak  [BS],  and  Cochran  [Co]. 

We  proved  in  [T9]  the  following  estimate 

n)pi)P2  X  m-«+(l/di-m-x(l/2,l/Pl)j 

for  1  <  qi  <  pi  <  oo,  1  <  q2  =  p2  <  oo  and  R  >  r(qi,pi).  We  denote  here  r(q,p )  := 
(1  /q  ~  1  /p)+  for  1  <q<p<2ovl<p<q<oo  and  r(q,p )  :=  max(l/2, 1/g) 
otherwise.  This  inequality  implies  in  particular  that  (11.5)  holds  also  for  q  =  1. 

We  discuss  now  an  application  of  bilinear  approximation  to  the  theory  of  widths. 
As  we  know  the  starting  point  of  this  theory  is  a  function  class,  say,  the  function 
class  Wg.  This  function  class  can  be  associated  with  one  function  -  the  Bernoulli 
kernel  Fr(x  —  y )  with 

OO 

Fr(t)  :=  2  k~r  cos (kt  —  nr/ 2). 

k= l 

We  have 

r*2l T 

wq  =  {f  '■  /(®)  = /(0)  +  (27r)_1  /  fr(x  -  y)<p(y)dy,  ||^||,  <  1}. 

Jo 

In  the  development  of  approximation  by  trigonometric  polynomials  it  was  under¬ 
stood  that  the  rate  of  decay  of  En(f )  of  individual  functions,  say  En(Fr).  is  gov¬ 
erned  by  smoothness  properties  of  the  function.  It  turned  out  that  we  have  similar 
phenomenon  on  the  much  more  general  level. 

For  a  function  g  G  L i(T2)  define  a  function  class 

wq  :=  {f  :  fix)  =  (^r1  /  g(x,y)F(y)dy,  IM|g  <  i}. 

Jo 

We  proved  in  [T7]  that  Fp(x  —  y)  is  a  typical  representative  of  the  following  class 
of  functions.  Denote  MH[1,r2B  the  class  of  functions  g(x,y)  such  that  ||g||i  <  oo, 

p27T  n2n 

/  g(x,  y)dx  =  /  g(x,y)dy  =  0 
Jo  Jo 

(this  condition  is  imposed  only  for  convenience),  and 

WKutidiXiy^h  <  B\hnt2r,  n,r2  >0,  l  :=  max([n],  [r2])  +  1, 

where  A lti  t2  denotes  the  operator  of  the  mixed  difference  of  order  l  in  each  variable 
with  step  ti  in  x  and  step  t2  in  y.  We  remark  that  the  function  Fp(x  —  y)  belongs 
to  MFJ[1,r,2F>  for  any  ri,r2  such  that  rq  -F  r2  =  p.  We  proved  in  [T7]  the  following 
statement. 

Theorem  11.9.  For  all  1  <  q,p  <  oo  we  have 

sup  dm(W|,  Lp)  x  dm(W^+rF  Lp) 

g£MH[1’r'2 

for  r\  >  1,  r2  >  1  +  max(l  /q,  1/2)  for  2  <  q  <  p  <  oo  orl<q<2<p<oo  and 
r2  >  1  otherwise. 
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Open  problems. 

11.1.  Find  necessary  and  sufficient  conditions  on  a  weakness  sequence  r  to 
guaranty  convergence  of  Weak  Greedy  Algorithm  with  regard  to  IIo  for  each  /  E  L2. 

11.2.  Does  Lp-Greedy  Algorithm  with  regard  to  np  converge  for  each  /  E  Lp, 
1  <  p  <  oo? 

11.3.  Does  Dual  Greedy  Algorithm  with  regard  to  Iip  converge  for  each  /  E  Lp, 

1  <  p  <  OO? 

11.4.  If  the  answer  to  Problem  11.3  is  “yes”  then  find  necessary  and  sufficient 
conditions  on  a  weakness  sequence  r  to  guaranty  convergence  of  Weak  Dual  Greedy 
Algorithm  with  regard  to  Hp  for  each  /  G  Lp. 

11.5.  Find  necessary  and  sufficient  conditions  on  a  weakness  sequence  r  to 
guaranty  convergence  of  Weak  Chebyshev  Greedy  Algorithm  with  regard  to  IIp  for 
each  f  E  Lp. 

11.6.  Let  Rn  be  the  Rudin-Shapiro  polynomials  (see  Section  4).  Prove  that 

(7  m{  RN  ( x  -y),  n)i  »  N1/2. 

11.7.  Find  the  order  of  the  sequence 

(11.6)  am(NH[Rl’R2\  n)PliOOJ  m  =  1,2,... 

in  the  case  R\  <  R2,  2  <  pi  <  00. 

Comment.  In  the  case  Ri  >  R-2  the  order  of  (11.6)  is  known  (see  [T9]). 

11.8.  Study  efficiency  of  Pure  Greedy  Algorithm  (L2-Greedy  Algorithm)  with 
regard  to  IIo  for  approximation  of  function  classes  NHqRlq,R ^  in  the  LPl;P2-norm. 

12.  Ridge  Approximation 

This  section  similarly  to  Section  11  is  devoted  to  approximation  of  functions  of 
two  variables.  The  results  discussed  here  may  be  seen  as  one  more  (in  addition  to 
Section  11)  example  in  the  development  of  the  following  general  approach  in  mul¬ 
tivariate  approximation.  Approximate  functions  of  several  variables  by  univariate 
functions.  This  idea  is  interesting  from  theoretical  point  of  view  and  also  looks 
reasonable  from  computational  point  of  view.  There  is  a  number  of  different  real¬ 
izations  of  this  approach  in  approximation  theory.  We  mention  some  of  them  for 
illustration.  We  begin  with  the  simplest  one.  S.N.  Bernstein  (see  [Be])  suggested  to 
study  the  following  type  of  approximation  to  a  continuous  periodic  function  f(x,  y ) 
on  two  variables 

(12-1)  EUj00(f)  :=  inf  || f(x,y)  -  ck{y)elkx\\ 

{ck{y)}  |fc|  <ra 

in  the  uniform  norm  ||  -  ||.  The  approximant  in  (12.1)  is  a  linear  combination  of 
products  of  univariate  functions.  The  Bernstein  setting  of  the  problem  (12.1)  is 
a  variant  of  the  classical  problem  of  bilinear  approximation  which  was  discussed 
in  Section  11.  The  important  feature  of  the  problem  of  bilinear  approximation  is 
that  the  approximating  system  {u(x)v(y)}U:V&L2  is  highly  redundant.  However, 
as  we  have  seen  in  Section  11  the  redundancy  did  not  hinder  the  development  of 
nice  theory  to  solve  the  problem  of  best  bilinear  approximation  in  the  L2-norm. 
What  really  allowed  to  do  it  is  the  structure  of  the  system.  In  this  section  we 
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discuss  approximation  by  a  redundant  system  with  quite  different  structure.  We 
approximate  by  linear  combinations  of  ridge  functions,  i.e.  functions  G(x),  ifR2, 
which  can  be  represented  in  the  form 

(12.2)  G(x)  =  g((x,e)) 

where  g  is  a  univariate  function  and  its  argument  ( x ,  e )  is  the  scalar  product  of  x 
and  a  unit  vector  e  G  M2.  We  denote  the  set  of  functions  of  the  form  (12.2)  by 
7Z  and  call  it  the  system  of  ridge  functions.  The  above  mentioned  approximation 
(approximation  by  ridge  functions)  also  uses  univariate  functions  and  the  system 
1Z  of  all  ridge  functions  is  highly  redundant.  Unlike  the  bilinear  approximation 
problem  we  do  not  have  a  theory  which  provides  (describes)  the  solution  to  the 
problem  of  best  ridge  approximation.  In  this  section  we  confine  ourselves  to  the 
case  of  functions  of  two  variables  and  approximate  only  in  Hilbert  space  L2.  We 
note  that  approximation  by  ridge  functions  got  much  attention  recently  for  the 
following  two  reasons.  The  first  is  that  a  ridge  function  can  be  interpreted  as  a 
plane  wave.  This  means  that  the  problem  of  ridge  approximation  can  be  seen  as 
a  problem  of  representation  of  a  general  wave  by  plane  waves.  The  second  reason 
is  that  ridge  approximation  proved  to  be  useful  in  neural  networks  approximation 
(see  [DOP]). 

There  are  some  general  results  on  approximation  by  linear  combinations  of  el¬ 
ements  of  a  redundant  system  in  Hilbert  space  (see  Theorem  11.3).  These  results 
are  expressed  in  terms  of  the  Ap(T>) -quasinorm  determined  by  a  dictionary  V.  Let 
D  :=  {( x\,x2 )  :  x\  +  x\  <  1}  be  the  unit  disk  and  Lp(D),  1  <  p  <  oo,  denote  the 
Banach  space  with  the  norm 


ll/ll*  :=  II/IImd)  :=(-  [  I f(x)\*dx)'/». 

K  J  D 

From  this  point  on  we  denote  by  7 Zp  the  dictionary  for  Lp(D)  which  consists 
of  elements  of  the  system  1Z  normalized  in  Lp(D).  Similarly  to  the  bilinear  ap¬ 
proximation  we  use  the  notation  7 Z  instead  of  1ZP  when  we  talk  about  best  to- term 
approximations.  In  a  particular  case  X  =  Lp(D),  1  <  p  <  oo,  V  =  7 Zp  Theorem 
11.3  gives  the  estimate 

(12.3)  <ym(f,n)r  <  C(p)m”“<I*'V2)-1//3|/|Aj(Kji). 

This  inequality  gives  the  error  estimate  of  best  to- term  approximation  in  terms  of 
\.f\Afs(np)  which  is  not  well  studied.  In  order  to  use  this  general  result  we  need 
to  varify  that  a  given  function  /  can  be  approximated  by  functions  which  have 
special  representation  (see  definition  of  Ap{V)),  what  in  turn  could  be  a  nontrivial 
problem.  We  will  present  some  results  on  estimates  for  crm(f,TZ)p  in  terms  of 
standard  classes  of  functions.  In  this  section  we  deal  with  the  function  class  which 
is  defined  in  a  way  standard  for  constructive  approximation.  We  define  the  class 
of  functions  H7p(D)  using  the  classical  means  of  approximation,  namely,  algebraic 
polynomials.  Let  V{n ,  2)  denote  the  set  of  algebraic  polynomials 

^  ^  C-k,l%\X2 

k-\-l<n  —  1 
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of  total  degree  n  —  1.  Denote  by  H'p(D)  ,  r  >  0,  the  set  of  all  functions  /  G  LP(D ) 
which  can  be  repsented  in  the  form 

OO 

/  =  $>»,  Pn£V(  2n,2),  n  =  1,2,..., 

n=l 

with  pn  satisfying  the  inequalities 


|  \Pn  ||p  <  2-™. 

The  following  result  (see  [LS] )  gives  the  upper  estimates  for  <jrn(Hp(D),H)p  auto¬ 
matically. 

Theorem  12.1.  For  any  algebraic  polynomial  p  G  V(N,  2)  there  exist  N  univariate 
polynomials  j  =  0, . . . ,  N  —  1,  of  degree  N  —  1  with  the  following  property 


(12.4) 


N- 1 


P(x)  =  9J((x,ef)), 

j=o 


where  e ^  :=  (cos  sin  ^). 

This  gives  the  estimate 

(12.5)  a m{Hp(D)  11Z) p  <  C(r)m-r. 

It  turned  out  that  in  the  case  p  =  2  the  estimate  (12.5)  is  sharp: 

(12.6)  am(H^(D),n)2  >  C(r)m-r. 

The  first  result  in  this  direction  a  little  weaker  version  of  (12.6)  was  obtained  in 
[T13].  The  estimate  (12.6)  was  proved  in  [Ma].  The  estimate  (12.6)  also  follows 
from  the  relation 

(12.7)  *m(f,n)2>C  inf  11/  —  p||2 

pGV(3m,2) 

established  in  [03]  for  radial  functions  /,  f(x i,x2)  =  h((x\  +  xl)1/2). 

We  proved  recently  (see  [MOT])  that  the  estimate  (12.5)  in  the  case  p  =  2  can 
be  realized  by  PGA 

(12.8)  sup  ||/ -  Gm(f,  U2) ||2  <  C(r)m~r. 

f&H- 

Let  us  make  some  comments  on  (12.8).  First  of  all  this  estimate  shows  that  PGA 
with  regard  to  1Z2  is  not  saturated.  Moreover,  combining  (12.8)  with  (12.7)  we  get 
that  for  radial  functions  /  such  that 

(12.9)  am(/,^)2  <  C(r)m~r 
we  have 

\\f-Gm(f,n2)\\2<C(r)m-r. 

This  is  a  weaker  analog  of  the  r-greedy  property  for  7l2. 
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Open  problems. 

12.1.  Find  necessary  and  sufficient  conditions  on  a  weakness  sequence  r  to 
guaranty  convergence  of  Weak  Greedy  Algorithm  with  regard  to  1Z2  for  each  /  E  L2. 

12.2.  Does  Lp-Greedy  Algorithm  with  regard  to  1ZP  converge  for  each  /  E  Lp, 
1  <  p  <  00? 

12.3.  Does  Dual  Greedy  Algorithm  with  regard  to  1ZP  converge  for  each  /  E  Lp, 

1  <  p  <  OO? 

12.4.  If  the  answer  to  Problem  12.3  is  “yes”  then  find  necessary  and  sufficient 
conditions  on  a  weakness  sequence  r  to  guaranty  convergence  of  Weak  Dual  Greedy 
Algorithm  with  regard  to  1ZP  for  each  /  G  Lp. 

12.5.  Find  necessary  and  sufficient  conditions  on  a  weakness  sequence  r  to 
guaranty  convergence  of  Weak  Chebyshev  Greedy  Algorithm  with  regard  to  1ZP  for 
each  f  E  Lp. 

12.6.  Find  the  order  of  the  quantity 

sup  \\f  -  Grn{f,n2)\\L2(V)- 

fSAiiHa) 

12.7.  Could  the  estimate  (12.5)  for  1  <  p  <  00  be  realized  by  WCGA  with 
r  =  {t},  0  <  t  <  1? 
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